- 02 6月, 2020 1 次提交
-
-
由 Lorenzo Bianconi 提交于
In order to use standard 'xdp' prefix, rename convert_to_xdp_frame utility routine in xdp_convert_buff_to_frame and replace all the occurrences Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/6344f739be0d1a08ab2b9607584c4d5478c8c083.1590698295.git.lorenzo@kernel.org
-
- 15 5月, 2020 1 次提交
-
-
由 Jesper Dangaard Brouer 提交于
The virtio_net driver is running inside the guest-OS. There are two XDP receive code-paths in virtio_net, namely receive_small() and receive_mergeable(). The receive_big() function does not support XDP. In receive_small() the frame size is available in buflen. The buffer backing these frames are allocated in add_recvbuf_small() with same size, except for the headroom, but tailroom have reserved room for skb_shared_info. The headroom is encoded in ctx pointer as a value. In receive_mergeable() the frame size is more dynamic. There are two basic cases: (1) buffer size is based on a exponentially weighted moving average (see DECLARE_EWMA) of packet length. Or (2) in case virtnet_get_headroom() have any headroom then buffer size is PAGE_SIZE. The ctx pointer is this time used for encoding two values; the buffer len "truesize" and headroom. In case (1) if the rx buffer size is underestimated, the packet will have been split over more buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of buffer area). If that happens the XDP path does a xdp_linearize_page operation. V3: Adjust frame_sz in receive_mergeable() case, spotted by Jason Wang. The code is really hard to follow, so some hints to reviewers. The receive_mergeable() case gets frames that were allocated in add_recvbuf_mergeable() which uses headroom=virtnet_get_headroom(), and 'buf' ptr is advanced this headroom. The headroom can only be 0 or VIRTIO_XDP_HEADROOM, as virtnet_get_headroom is really simple: static unsigned int virtnet_get_headroom(struct virtnet_info *vi) { return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0; } As frame_sz is an offset size from xdp.data_hard_start, reviewers should notice how this is calculated in receive_mergeable(): int offset = buf - page_address(page); [...] data = page_address(xdp_page) + offset; xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len; The calculated offset will always be VIRTIO_XDP_HEADROOM when reaching this code. Thus, xdp.data_hard_start will be page-start address plus vi->hdr_len. Given this xdp.frame_sz need to be reduced with vi->hdr_len size. IMHO a followup patch should cleanup this code to make it easier to maintain and understand, but it is outside the scope of this patchset. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NJason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/bpf/158945344436.97035.9445115070189151680.stgit@firesoul
-
- 07 5月, 2020 1 次提交
-
-
由 Michael S. Tsirkin 提交于
When we fill up a receive VQ, try_fill_recv currently tries to count kicks using a 64 bit stats counter. Turns out, on a 32 bit kernel that uses a seqcount. sequence counts are "lock" constructs where you need to make sure that writers are serialized. In turn, this means that we mustn't run two try_fill_recv concurrently. Which of course we don't. We do run try_fill_recv sometimes from a softirq napi context, and sometimes from a fully preemptible context, but the later always runs with napi disabled. However, when it comes to the seqcount, lockdep is trying to enforce the rule that the same lock isn't accessed from preemptible and softirq context - it doesn't know about napi being enabled/disabled. This causes a false-positive warning: WARNING: inconsistent lock state ... inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. As a work around, shut down the warning by switching to u64_stats_update_begin_irqsave - that works by disabling interrupts on 32 bit only, is a NOP on 64 bit. Reported-by: NThomas Gleixner <tglx@linutronix.de> Suggested-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 3月, 2020 1 次提交
-
-
由 Jakub Kicinski 提交于
Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver correctly rejects all unsupported parameters. As a side effect of these changes the error code for unsupported params changes from EINVAL to EOPNOTSUPP. v2: correctly handle rx-frames (and adjust the commit msg) v3: adjust commit message for new error code and member name Signed-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 3月, 2020 1 次提交
-
-
由 Cris Forno 提交于
With the ethtool_virtdev_set_link_ksettings function in core/ethtool.c, ibmveth, netvsc, and virtio now use the core's helper function. Funtionality changes that pertain to ibmveth driver include: 1. Changed the initial hardcoded link speed to 1GB. 2. Added support for allowing a user to change the reported link speed via ethtool. Functionality changes to the netvsc driver include: 1. When netvsc_get_link_ksettings is called, it will defer to the VF device if it exists to pull accelerated networking values, otherwise pull default or user-defined values. 2. Similarly, if netvsc_set_link_ksettings called and a VF device exists, the real values of speed and duplex are changed. Signed-off-by: NCris Forno <cforno12@linux.vnet.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 2月, 2020 2 次提交
-
-
由 Yuya Kusakabe 提交于
Implement support for transferring XDP meta data into skb for virtio_net driver; before calling into the program, xdp.data_meta points to xdp.data, where on program return with pass verdict, we call into skb_metadata_set(). Tested with the script at https://github.com/higebu/virtio_net-xdp-metadata-test. Signed-off-by: NYuya Kusakabe <yuya.kusakabe@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/bpf/20200225033212.437563-2-yuya.kusakabe@gmail.com
-
由 Yuya Kusakabe 提交于
We do not want to care about the vnet header in receive_small() if XDP is loaded, since we can not know whether or not the packet is modified by XDP. Fixes: f6b10209 ("virtio-net: switch to use build_skb() for small buffer") Signed-off-by: NYuya Kusakabe <yuya.kusakabe@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/bpf/20200225033212.437563-1-yuya.kusakabe@gmail.com
-
- 27 1月, 2020 1 次提交
-
-
由 John Fastabend 提交于
virtio_net currently relies on rcu critical section to access the xdp program in its xdp_xmit handler. However, the pointer to the xdp program is only used to do a NULL pointer comparison to determine if xdp is enabled or not. Use rcu_access_pointer() instead of rcu_dereference() to reflect this. Then later when we drop rcu_read critical section virtio_net will not need in special handling. Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/1580084042-11598-3-git-send-email-john.fastabend@gmail.com
-
- 17 1月, 2020 1 次提交
-
-
由 Toke Høiland-Jørgensen 提交于
Since the bulk queue used by XDP_REDIRECT now lives in struct net_device, we can re-use the bulking for the non-map version of the bpf_redirect() helper. This is a simple matter of having xdp_do_redirect_slow() queue the frame on the bulk queue instead of sending it out with __bpf_tx_xdp(). Unfortunately we can't make the bpf_redirect() helper return an error if the ifindex doesn't exit (as bpf_redirect_map() does), because we don't have a reference to the network namespace of the ingress device at the time the helper is called. So we have to leave it as-is and keep the device lookup in xdp_do_redirect_slow(). Since this leaves less reason to have the non-map redirect code in a separate function, so we get rid of the xdp_do_redirect_slow() function entirely. This does lose us the tracepoint disambiguation, but fortunately the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint entry structures. This means both can contain a map index, so we can just amend the tracepoint definitions so we always emit the xdp_redirect(_err) tracepoints, but with the map ID only populated if a map is present. This means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep the definitions around in case someone is still listening for them. With this change, the performance of the xdp_redirect sample program goes from 5Mpps to 8.4Mpps (a 68% increase). Since the flush functions are no longer map-specific, rename the flush() functions to drop _map from their names. One of the renamed functions is the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To keep from having to update all drivers, use a #define to keep the old name working, and only update the virtual drivers in this patch. Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk
-
- 18 11月, 2019 1 次提交
-
-
由 Andrii Nakryiko 提交于
Similarly to bpf_map's refcnt/usercnt, convert bpf_prog's refcnt to atomic64 and remove artificial 32k limit. This allows to make bpf_prog's refcounting non-failing, simplifying logic of users of bpf_prog_add/bpf_prog_inc. Validated compilation by running allyesconfig kernel build. Suggested-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAndrii Nakryiko <andriin@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191117172806.2195367-3-andriin@fb.com
-
- 02 10月, 2019 1 次提交
-
-
由 Florian Westphal 提交于
commit 174e2381 ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi recycle always drop skb extensions. The additional skb_ext_del() that is performed via nf_reset on napi skb recycle is not needed anymore. Most nf_reset() calls in the stack are there so queued skb won't block 'rmmod nf_conntrack' indefinitely. This removes the skb_ext_del from nf_reset, and renames it to a more fitting nf_reset_ct(). In a few selected places, add a call to skb_ext_reset to make sure that no active extensions remain. I am submitting this for "net", because we're still early in the release cycle. The patch applies to net-next too, but I think the rename causes needless divergence between those trees. Suggested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 04 9月, 2019 1 次提交
-
-
由 ? jiang 提交于
This change lowers ring buffer reclaim threshold from 1/2*queue to budget for better performance. According to our test with qemu + dpdk, packet dropping happens when the guest is not able to provide free buffer in avail ring timely with default 1/2*queue. The value in the patch has been tested and does show better performance. Test setup: iperf3 to generate packets to guest (total 30mins, pps 400k, UDP) avg packets drop before: 2842 avg packets drop after: 360(-87.3%) Further, current code suffers from a starvation problem: the amount of work done by try_fill_recv is not bounded by the budget parameter, thus (with large queues) once in a while userspace gets blocked for a long time while queue is being refilled. Trigger refills earlier to make sure the amount of work to do is limited. Signed-off-by: Njiangkidd <jiangkidd@hotmail.com> Acked-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 15 6月, 2019 1 次提交
-
-
由 Willem de Bruijn 提交于
NAPI tx mode improves TCP behavior by enabling TCP small queues (TSQ). TSQ reduces queuing ("bufferbloat") and burstiness. Previous measurements have shown significant improvement for TCP_STREAM style workloads. Such as those in commit 86a5df14 ("Merge branch 'virtio-net-tx-napi'"). There has been uncertainty about smaller possible regressions in latency due to increased reliance on tx interrupts. The above results did not show that, nor did I observe this when rerunning TCP_RR on Linux 5.1 this week on a pair of guests in the same rack. This may be subject to other settings, notably interrupt coalescing. In the unlikely case of regression, we have landed a credible runtime solution. Ethtool can configure it with -C tx-frames [0|1] as of commit 0c465be1 ("virtio_net: ethtool tx napi configuration"). NAPI tx mode has been the default in Google Container-Optimized OS (COS) for over half a year, as of release M70 in October 2018, without any negative reports. Link: https://marc.info/?l=linux-netdev&m=149305618416472 Link: https://lwn.net/Articles/507065/Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 5月, 2019 1 次提交
-
-
由 Thomas Gleixner 提交于
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details [based] [from] [clk] [highbank] [c] you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 355 file(s). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org> Reviewed-by: NJilayne Lovejoy <opensource@jilayne.com> Reviewed-by: NSteve Winslow <swinslow@gmail.com> Reviewed-by: NAllison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190519154041.837383322@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 07 4月, 2019 2 次提交
-
-
由 Yuval Shaia 提交于
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuval Shaia 提交于
This header is not in use - remove it. Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 4月, 2019 1 次提交
-
-
由 Florian Westphal 提交于
There are two reasons for this. First, the xmit_more flag conceptually doesn't fit into the skb, as xmit_more is not a property related to the skb. Its only a hint to the driver that the stack is about to transmit another packet immediately. Second, it was only done this way to not have to pass another argument to ndo_start_xmit(). We can place xmit_more in the softnet data, next to the device recursion. The recursion counter is already written to on each transmit. The "more" indicator is placed right next to it. Drivers can use the netdev_xmit_more() helper instead of skb->xmit_more to check the "more packets coming" hint. skb->xmit_more is retained (but always 0) to not cause build breakage. This change takes care of the simple s/skb->xmit_more/netdev_xmit_more()/ conversions. Remaining drivers are converted in the next patches. Suggested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 3月, 2019 1 次提交
-
-
由 Peter Xu 提交于
The variable is never used. CC: Michael S. Tsirkin <mst@redhat.com> CC: Jason Wang <jasowang@redhat.com> CC: virtualization@lists.linux-foundation.org CC: netdev@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 2月, 2019 1 次提交
-
-
由 Toshiaki Makita 提交于
Previously virtnet_xdp_xmit() did not account for device tx counters, which caused confusions. To be consistent with SKBs, account them on freeing xdp_frames. Reported-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 1月, 2019 7 次提交
-
-
由 Toshiaki Makita 提交于
We do not reset or free up unused buffers when enabling/disabling XDP, so it can happen that xdp_frames are freed after disabling XDP or sk_buffs are freed after enabling XDP on xdp tx queues. Thus we need to handle both forms (xdp_frames and sk_buffs) regardless of XDP setting. One way to trigger this problem is to disable XDP when napi_tx is enabled. In that case, virtnet_xdp_set() calls virtnet_napi_enable() which kicks NAPI. The NAPI handler will call virtnet_poll_cleantx() which invokes free_old_xmit_skbs() for queues which have been used by XDP. Note that even with this change we need to keep skipping free_old_xmit_skbs() from NAPI handlers when XDP is enabled, because XDP tx queues do not aquire queue locks. - v2: Use napi_consume_skb() instead of dev_consume_skb_any() Fixes: 4941d472 ("virtio-net: do not reset during XDP set") Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Toshiaki Makita 提交于
put_page() can work as a fallback for freeing xdp_frames, but the appropriate way is to use xdp_return_frame(). Fixes: cac320c8 ("virtio_net: convert to use generic xdp_frame and xdp_return_frame API") Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: NJason Wang <jasowang@redhat.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Toshiaki Makita 提交于
Commit 8dcc5b0a ("virtio_net: fix ndo_xdp_xmit crash towards dev not ready for XDP") tried to avoid access to unexpected sq while XDP is disabled, but was not complete. There was a small window which causes out of bounds sq access in virtnet_xdp_xmit() while disabling XDP. An example case of - curr_queue_pairs = 6 (2 for SKB and 4 for XDP) - online_cpu_num = xdp_queue_paris = 4 when XDP is enabled: CPU 0 CPU 1 (Disabling XDP) (Processing redirected XDP frames) virtnet_xdp_xmit() virtnet_xdp_set() _virtnet_set_queues() set curr_queue_pairs (2) check if rq->xdp_prog is not NULL virtnet_xdp_sq(vi) qp = curr_queue_pairs - xdp_queue_pairs + smp_processor_id() = 2 - 4 + 1 = -1 sq = &vi->sq[qp] // out of bounds access set xdp_queue_pairs (0) rq->xdp_prog = NULL Basically we should not change curr_queue_pairs and xdp_queue_pairs while someone can read the values. Thus, when disabling XDP, assign NULL to rq->xdp_prog first, and wait for RCU grace period, then change xxx_queue_pairs. Note that we need to keep the current order when enabling XDP though. - v2: Make rcu_assign_pointer/synchronize_net conditional instead of _virtnet_set_queues. Fixes: 186b3c99 ("virtio-net: support XDP_REDIRECT") Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Toshiaki Makita 提交于
When XDP is disabled, curr_queue_pairs + smp_processor_id() can be larger than max_queue_pairs. There is no guarantee that we have enough XDP send queues dedicated for each cpu when XDP is disabled, so do not count drops on sq in that case. Fixes: 5b8f3c8d ("virtio_net: Add XDP related stats") Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Toshiaki Makita 提交于
When _virtnet_set_queues() failed we did not restore real_num_rx_queues. Fix this by placing the change of real_num_rx_queues after _virtnet_set_queues(). This order is also in line with virtnet_set_channels(). Fixes: 4941d472 ("virtio-net: do not reset during XDP set") Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Toshiaki Makita 提交于
When napi_tx is enabled, virtnet_poll_cleantx() called free_old_xmit_skbs() even for xdp send queue. This is bogus since the queue has xdp_frames, not sk_buffs, thus mangled device tx bytes counters because skb->len is meaningless value, and even triggered oops due to general protection fault on freeing them. Since xdp send queues do not aquire locks, old xdp_frames should be freed only in virtnet_xdp_xmit(), so just skip free_old_xmit_skbs() for xdp send queues. Similarly virtnet_poll_tx() called free_old_xmit_skbs(). This NAPI handler is called even without calling start_xmit() because cb for tx is by default enabled. Once the handler is called, it enabled the cb again, and then the handler would be called again. We don't need this handler for XDP, so don't enable cb as well as not calling free_old_xmit_skbs(). Also, we need to disable tx NAPI when disabling XDP, so virtnet_poll_tx() can safely access curr_queue_pairs and xdp_queue_pairs, which are not atomically updated while disabling XDP. Fixes: b92f1e67 ("virtio-net: transmit napi") Fixes: 7b0411ef ("virtio-net: clean tx descriptors from rx napi") Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Toshiaki Makita 提交于
Commit 4e09ff53 ("virtio-net: disable NAPI only when enabled during XDP set") tried to fix inappropriate NAPI enabling/disabling when !netif_running(), but was not complete. On error path virtio_net could enable NAPI even when !netif_running(). This can cause enabling NAPI twice on virtnet_open(), which would trigger BUG_ON() in napi_enable(). Fixes: 4941d472 ("virtio-net: do not reset during XDP set") Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 1月, 2019 2 次提交
-
-
由 Michael S. Tsirkin 提交于
Use napi_consume_skb() to get bulk free. Note that napi_consume_skb is safe to call in a non-napi context as long as the napi_budget flag is correct. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
On multiqueue network devices, RPS maps are configured independently for each receive queue through /sys/class/net/$DEV/queues/rx-*. On virtio-net currently all packets use the map from rx-0, because the real rx queue is not known at time of map lookup by get_rps_cpu. Call skb_record_rx_queue in the driver rx path to make lookup work. Recording the receive queue has ramifications beyond RPS, such as in sticky load balancing decisions for sockets (skb_tx_hash) and XPS. Reported-by: NMark Hlady <mhlady@google.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 12月, 2018 1 次提交
-
-
由 Willem de Bruijn 提交于
Virtio-net devices negotiate LRO support with the host. Display the initially negotiated state with ethtool -k. Also allow configuring it with ethtool -K, reusing the existing virtnet_set_guest_offloads helper that configures LRO for XDP. This is conditional on VIRTIO_NET_F_CTRL_GUEST_OFFLOADS. Virtio-net negotiates TSO4 and TSO6 separately, but ethtool does not distinguish between the two. Display LRO as on only if any offload is active. RTNL is held while calling virtnet_set_features, same as on the path from virtnet_xdp_set. Changes v1 -> v2 - allow ethtool config (-K) only if VIRTIO_NET_F_CTRL_GUEST_OFFLOADS - show LRO as enabled if any LRO variant is enabled - do not allow configuration while XDP is active - differentiate current features from the capable set, to restore on XDP down only those features that were active on XDP up - move test out of VIRTIO_NET_F_CSUM/TSO branch, which is tx only Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 12月, 2018 1 次提交
-
-
由 Jason Wang 提交于
We copy vnet header unconditionally in page_to_skb() this is wrong since XDP may modify the packet data. So let's keep a zeroed vnet header for not confusing the conversion between vnet header and skb metadata. In the future, we should able to detect whether or not the packet was modified and keep using the vnet header when packet was not touched. Fixes: f600b690 ("virtio_net: Add XDP support") Reported-by: NPavel Popa <pashinho1990@gmail.com> Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 11月, 2018 2 次提交
-
-
由 Jason Wang 提交于
We don't support partial csumed packet since its metadata will be lost or incorrect during XDP processing. So fail the XDP set if guest_csum feature is negotiated. Fixes: f600b690 ("virtio_net: Add XDP support") Reported-by: NJesper Dangaard Brouer <brouer@redhat.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pavel Popa <pashinho1990@gmail.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Wang 提交于
We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we can receive partial csumed packets with metadata kept in the vnet_hdr. This may have several side effects: - It could be overridden by header adjustment, thus is might be not correct after XDP processing. - There's no way to pass such metadata information through XDP_REDIRECT to another driver. - XDP does not support checksum offload right now. So simply disable guest csum if possible in this the case of XDP. Fixes: 3f93522f ("virtio-net: switch off offloads on demand if possible on XDP set") Reported-by: NJesper Dangaard Brouer <brouer@redhat.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Pavel Popa <pashinho1990@gmail.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 10月, 2018 1 次提交
-
-
由 Ake Koomsin 提交于
Commit 713a98d9 ("virtio-net: serialize tx routine during reset") introduces netif_tx_disable() after netif_device_detach() in order to avoid use-after-free of tx queues. However, there are two issues. 1) Its operation is redundant with netif_device_detach() in case the interface is running. 2) In case of the interface is not running before suspending and resuming, the tx does not get resumed by netif_device_attach(). This results in losing network connectivity. It is better to use netif_tx_lock_bh()/netif_tx_unlock_bh() instead for serializing tx routine during reset. This also preserves the symmetry of netif_device_detach() and netif_device_attach(). Fixes commit 713a98d9 ("virtio-net: serialize tx routine during reset") Signed-off-by: NAke Koomsin <ake@igel.co.jp> Acked-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 10月, 2018 1 次提交
-
-
由 Jason Wang 提交于
Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) handlers. Interrupt moderation is currently not supported, so these accept and display the default settings of 0 usec and 1 frame. Toggle tx napi through setting tx-frames. So as to not interfere with possible future interrupt moderation, value 1 means tx napi while value 0 means not. Only allow the switching when device is down for simplicity. Link: https://patchwork.ozlabs.org/patch/948149/Suggested-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 9月, 2018 1 次提交
-
-
由 Eric Dumazet 提交于
As diagnosed by Song Liu, ndo_poll_controller() can be very dangerous on loaded hosts, since the cpu calling ndo_poll_controller() might steal all NAPI contexts (for all RX/TX queues of the NIC). This capture can last for unlimited amount of time, since one cpu is generally not able to drain all the queues under load. virto_net uses NAPI for TX completions, so we better let core networking stack call the napi->poll() to avoid the capture. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 8月, 2018 1 次提交
-
-
由 YueHaibing 提交于
Remove duplicated include linux/netdevice.h Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 8月, 2018 2 次提交
-
-
由 Caleb Raitto 提交于
Always set the affinity hint, even if #cpu != #vq. Handle the case where #cpu > #vq (including when #cpu % #vq != 0) and when #vq > #cpu (including when #vq % #cpu != 0). Signed-off-by: NCaleb Raitto <caraitto@google.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NJon Olson <jonolson@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Caleb Raitto 提交于
Make vp_set_vq_affinity() take a cpumask instead of taking a single CPU. If there are fewer queues than cores, queue affinity should be able to map to multiple cores. Link: https://patchwork.ozlabs.org/patch/948149/Suggested-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NCaleb Raitto <caraitto@google.com> Acked-by: NGonglei <arei.gonglei@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 8月, 2018 1 次提交
-
-
由 Andrei Vagin 提交于
The definition of static_key_slow_inc() has cpus_read_lock in place. In the virtio_net driver, XPS queues are initialized after setting the queue:cpu affinity in virtnet_set_affinity() which is already protected within cpus_read_lock. Lockdep prints a warning when we are trying to acquire cpus_read_lock when it is already held. This patch adds an ability to call __netif_set_xps_queue under cpus_read_lock(). Acked-by: NJason Wang <jasowang@redhat.com> ============================================ WARNING: possible recursive locking detected 4.18.0-rc3-next-20180703+ #1 Not tainted -------------------------------------------- swapper/0/1 is trying to acquire lock: 00000000cf973d46 (cpu_hotplug_lock.rw_sem){++++}, at: static_key_slow_inc+0xe/0x20 but task is already holding lock: 00000000cf973d46 (cpu_hotplug_lock.rw_sem){++++}, at: init_vqs+0x513/0x5a0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(cpu_hotplug_lock.rw_sem); lock(cpu_hotplug_lock.rw_sem); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by swapper/0/1: #0: 00000000244bc7da (&dev->mutex){....}, at: __driver_attach+0x5a/0x110 #1: 00000000cf973d46 (cpu_hotplug_lock.rw_sem){++++}, at: init_vqs+0x513/0x5a0 #2: 000000005cd8463f (xps_map_mutex){+.+.}, at: __netif_set_xps_queue+0x8d/0xc60 v2: move cpus_read_lock() out of __netif_set_xps_queue() Cc: "Nambiar, Amritha" <amritha.nambiar@intel.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Fixes: 8af2c06f ("net-sysfs: Add interface for Rx queue(s) map per Tx queue") Signed-off-by: NAndrei Vagin <avagin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 8月, 2018 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 1402059 ("Missing break in switch") Addresses-Coverity-ID: 1402060 ("Missing break in switch") Addresses-Coverity-ID: 1402061 ("Missing break in switch") Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-