- 03 4月, 2023 1 次提交
-
-
由 Vladimir Oltean 提交于
The fact that PTP 2-step TX timestamping is broken on DSA switches if the master also timestamps the same packets is documented by commit f685e609 ("net: dsa: Deny PTP on master if switch supports it"). We attempt to help the users avoid shooting themselves in the foot by making DSA reject the timestamping ioctls on an interface that is a DSA master, and the switch tree beneath it contains switches which are aware of PTP. The only problem is that there isn't an established way of intercepting ndo_eth_ioctl calls, so DSA creates avoidable burden upon the network stack by creating a struct dsa_netdevice_ops with overlaid function pointers that are manually checked from the relevant call sites. There used to be 2 such dsa_netdevice_ops, but now, ndo_eth_ioctl is the only one left. There is an ongoing effort to migrate driver-visible hardware timestamping control from the ndo_eth_ioctl() based API to a new ndo_hwtstamp_set() model, but DSA actively prevents that migration, since dsa_master_ioctl() is currently coded to manually call the master's legacy ndo_eth_ioctl(), and so, whenever a network device driver would be converted to the new API, DSA's restrictions would be circumvented, because any device could be used as a DSA master. The established way for unrelated modules to react on a net device event is via netdevice notifiers. So we create a new notifier which gets called whenever there is an attempt to change hardware timestamping settings on a device. Finally, there is another reason why a netdev notifier will be a good idea, besides strictly DSA, and this has to do with PHY timestamping. With ndo_eth_ioctl(), all MAC drivers must manually call phy_has_hwtstamp() before deciding whether to act upon SIOCSHWTSTAMP, otherwise they must pass this ioctl to the PHY driver via phy_mii_ioctl(). With the new ndo_hwtstamp_set() API, it will be desirable to simply not make any calls into the MAC device driver when timestamping should be performed at the PHY level. But there exist drivers, such as the lan966x switch, which need to install packet traps for PTP regardless of whether they are the layer that provides the hardware timestamps, or the PHY is. That would be impossible to support with the new API. The proposal there, too, is to introduce a netdev notifier which acts as a better cue for switching drivers to add or remove PTP packet traps, than ndo_hwtstamp_set(). The one introduced here "almost" works there as well, except for the fact that packet traps should only be installed if the PHY driver succeeded to enable hardware timestamping, whereas here, we need to deny hardware timestamping on the DSA master before it actually gets enabled. This is why this notifier is called "PRE_", and the notifier that would get used for PHY timestamping and packet traps would be called NETDEV_CHANGE_HWTSTAMP. This isn't a new concept, for example NETDEV_CHANGEUPPER and NETDEV_PRECHANGEUPPER do the same thing. In expectation of future netlink UAPI, we also pass a non-NULL extack pointer to the netdev notifier, and we make DSA populate it with an informative reason for the rejection. To avoid making it go to waste, we make the ioctl-based dev_set_hwtstamp() create a fake extack and print the message to the kernel log. Link: https://lore.kernel.org/netdev/20230401191215.tvveoi3lkawgg6g4@skbuf/ Link: https://lore.kernel.org/netdev/20230310164451.ls7bbs6pdzs4m6pw@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 3月, 2023 4 次提交
-
-
由 Eric Dumazet 提交于
____napi_schedule() adds a napi into current cpu softnet_data poll_list, then raises NET_RX_SOFTIRQ to make sure net_rx_action() will process it. Idea of this patch is to not raise NET_RX_SOFTIRQ when being called indirectly from net_rx_action(), because we can process poll_list from this point, without going to full softirq loop. This needs a change in net_rx_action() to make sure we restart its main loop if sd->poll_list was updated without NET_RX_SOFTIRQ being raised. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Jason Xing <kernelxing@tencent.com> Reviewed-by: NJason Xing <kerneljasonxing@gmail.com> Tested-by: NJason Xing <kerneljasonxing@gmail.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
-
由 Eric Dumazet 提交于
Based on initial patch from Jason Xing. Idea is to not raise NET_RX_SOFTIRQ from napi_schedule_rps() when we queued a packet into another cpu backlog. We can do this only in the context of us being called indirectly from net_rx_action(), to have the guarantee our rps_ipi_list will be processed before we exit from net_rx_action(). Link: https://lore.kernel.org/lkml/20230325152417.5403-1-kerneljasonxing@gmail.com/Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Jason Xing <kernelxing@tencent.com> Reviewed-by: NJason Xing <kerneljasonxing@gmail.com> Tested-by: NJason Xing <kerneljasonxing@gmail.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
-
由 Eric Dumazet 提交于
We want to make two optimizations in napi_schedule_rps() and ____napi_schedule() which require to know if these helpers are called from net_rx_action(), instead of being called from other contexts. sd.in_net_rx_action is only read/written by the owning cpu. Signed-off-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NJason Xing <kerneljasonxing@gmail.com> Tested-by: NJason Xing <kerneljasonxing@gmail.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
-
由 Eric Dumazet 提交于
napi_schedule_rps() return value is ignored, remove it. Change the comment to clarify the intent. Signed-off-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NJason Xing <kerneljasonxing@gmail.com> Tested-by: NJason Xing <kerneljasonxing@gmail.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
-
- 23 3月, 2023 2 次提交
-
-
由 Eric Dumazet 提交于
We want to remove our use of skb_mac_header() in tx paths, eg remove skb_reset_mac_header() from __dev_queue_xmit(). Idea is that ndo_start_xmit() can get the mac header simply looking at skb->data. Signed-off-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NSimon Horman <simon.horman@corigine.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Nick Child 提交于
When setting the XPS value of a TX queue, warn the user once if the index of the queue is greater than the number of allocated TX queues. Previously, this scenario went uncaught. In the best case, it resulted in unnecessary allocations. In the worst case, it resulted in out-of-bounds memory references through calls to `netdev_get_tx_queue( dev, index)`. Therefore, it is important to inform the user but not worth returning an error and risk downing the netdevice. Signed-off-by: NNick Child <nnac123@linux.ibm.com> Reviewed-by: NPiotr Raczynski <piotr.raczynski@intel.com> Link: https://lore.kernel.org/r/20230321150725.127229-1-nnac123@linux.ibm.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 08 3月, 2023 1 次提交
-
-
由 Eric Dumazet 提交于
enum skb_drop_reason is more generic, we can adopt it instead. Provide dev_kfree_skb_irq_reason() and dev_kfree_skb_any_reason(). This means drivers can use more precise drop reasons if they want to. Signed-off-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NSimon Horman <simon.horman@corigine.com> Reviewed-by: NYunsheng Lin <linyunsheng@huawei.com> Link: https://lore.kernel.org/r/20230306204313.10492-1-edumazet@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 24 2月, 2023 1 次提交
-
-
由 Eric Dumazet 提交于
dev_kfree_skb() is aliased to consume_skb(). When a driver is dropping a packet by calling dev_kfree_skb_any() we should propagate the drop reason instead of pretending the packet was consumed. Note: Now we have enum skb_drop_reason we could remove enum skb_free_reason (for linux-6.4) v2: added an unlikely(), suggested by Yunsheng Lin. Fixes: e6247027 ("net: introduce dev_consume_skb_any()") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: NYunsheng Lin <linyunsheng@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 2月, 2023 1 次提交
-
-
由 Eric Dumazet 提交于
kfree_skb() includes the location, it makes sense to add it to consume_skb() as well. After patch: taskd_EventMana 8602 [004] 420.406239: skb:consume_skb: skbaddr=0xffff893a4a6d0500 location=unix_stream_read_generic swapper 0 [011] 422.732607: skb:consume_skb: skbaddr=0xffff89597f68cee0 location=mlx4_en_free_tx_desc discipline 9141 [043] 423.065653: skb:consume_skb: skbaddr=0xffff893a487e9c00 location=skb_consume_udp swapper 0 [010] 423.073166: skb:consume_skb: skbaddr=0xffff8949ce9cdb00 location=icmpv6_rcv borglet 8672 [014] 425.628256: skb:consume_skb: skbaddr=0xffff8949c42e9400 location=netlink_dump swapper 0 [028] 426.263317: skb:consume_skb: skbaddr=0xffff893b1589dce0 location=net_rx_action wget 14339 [009] 426.686380: skb:consume_skb: skbaddr=0xffff893a51b552e0 location=tcp_rcv_state_process Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 2月, 2023 3 次提交
-
-
由 Ido Schimmel 提交于
Cited commit changed devlink to register its netdev notifier block on the global netdev notifier chain instead of on the per network namespace one. However, when changing the network namespace of the devlink instance, devlink still tries to unregister its notifier block from the chain of the old namespace and register it on the chain of the new namespace. This results in corruption of the notifier chains, as the same notifier block is registered on two different chains: The global one and the per network namespace one. In turn, this causes other problems such as the inability to dismantle namespaces due to netdev reference count issues. Fix by preventing devlink from moving its notifier block between namespaces. Reproducer: # echo "10 1" > /sys/bus/netdevsim/new_device # ip netns add test123 # devlink dev reload netdevsim/netdevsim10 netns test123 # ip netns del test123 [ 71.935619] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 71.938348] leaked reference. Fixes: 565b4824 ("devlink: change port event netdev notifier from per-net to global") Signed-off-by: NIdo Schimmel <idosch@nvidia.com> Reviewed-by: NJiri Pirko <jiri@nvidia.com> Reviewed-by: NJacob Keller <jacob.e.keller@intel.com> Reviewed-by: NJakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230215073139.1360108-1-idosch@nvidia.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
-
由 Jesse Brandeburg 提交于
The kernel stack can be more consistent by printing the IFF_PROMISC aka promiscuous enable/disable messages with the standard netdev_info message which can include bus and driver info as well as the device. typical command usage from user space looks like: ip link set eth0 promisc <on|off> But lots of utilities such as bridge, tcpdump, etc put the interface into promiscuous mode. old message: [ 406.034418] device eth0 entered promiscuous mode [ 408.424703] device eth0 left promiscuous mode new message: [ 406.034431] ice 0000:17:00.0 eth0: entered promiscuous mode [ 408.424715] ice 0000:17:00.0 eth0: left promiscuous mode Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
-
由 Jesse Brandeburg 提交于
When the user sets or clears the IFF_ALLMULTI flag in the netdev, there are no log messages printed to the kernel log to indicate anything happened. This is inexplicably different from most other dev->flags changes, and could suprise the user. Typically this occurs from user-space when a user: ip link set eth0 allmulticast <on|off> However, other devices like bridge set allmulticast as well, and many other flows might trigger entry into allmulticast as well. The new message uses the standard netdev_info print and looks like: [ 413.246110] ixgbe 0000:17:00.0 eth0: entered allmulticast mode [ 415.977184] ixgbe 0000:17:00.0 eth0: left allmulticast mode Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
-
- 13 2月, 2023 1 次提交
-
-
由 Felix Riemann 提交于
When converting net_device_stats to rtnl_link_stats64 sign extension is triggered on ILP32 machines as 6c1c5097 changed the previous "ulong -> u64" conversion to "long -> u64" by accessing the net_device_stats fields through a (signed) atomic_long_t. This causes for example the received bytes counter to jump to 16EiB after having received 2^31 bytes. Casting the atomic value to "unsigned long" beforehand converting it into u64 avoids this. Fixes: 6c1c5097 ("net: add atomic_long_t to net_device_stats fields") Signed-off-by: NFelix Riemann <felix.riemann@sma.de> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 2月, 2023 1 次提交
-
-
由 Jakub Kicinski 提交于
Add a Netlink spec-compatible family for netdevs. This is a very simple implementation without much thought going into it. It allows us to reap all the benefits of Netlink specs, one can use the generic client to issue the commands: $ ./cli.py --spec netdev.yaml --dump dev_get [{'ifindex': 1, 'xdp-features': set()}, {'ifindex': 2, 'xdp-features': {'basic', 'ndo-xmit', 'redirect'}}, {'ifindex': 3, 'xdp-features': {'rx-sg'}}] the generic python library does not have flags-by-name support, yet, but we also don't have to carry strings in the messages, as user space can get the names from the spec. Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: NLorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org> Co-developed-by: NKumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com> Co-developed-by: NMarek Majtyka <alardam@gmail.com> Signed-off-by: NMarek Majtyka <alardam@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/327ad9c9868becbe1e601b580c962549c8cd81f2.1675245258.git.lorenzo@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 02 2月, 2023 1 次提交
-
-
由 Xin Long 提交于
This patch introduces gso_ipv4_max_size and gro_ipv4_max_size per device and adds netlink attributes for them, so that IPV4 BIG TCP can be guarded by a separate tunable in the next patch. To not break the old application using "gso/gro_max_size" for IPv4 GSO packets, this patch updates "gso/gro_ipv4_max_size" in netif_set_gso/gro_max_size() if the new size isn't greater than GSO_LEGACY_MAX_SIZE, so that nothing will change even if userspace doesn't realize the new netlink attributes. Signed-off-by: NXin Long <lucien.xin@gmail.com> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 24 1月, 2023 3 次提交
-
-
由 Jesper Dangaard Brouer 提交于
The spin_lock irqsave/restore API variant in skb_defer_free_flush can be replaced with the faster spin_lock irq variant, which doesn't need to read and restore the CPU flags. Using the unconditional irq "disable/enable" API variant is safe, because the skb_defer_free_flush() function is only called during NAPI-RX processing in net_rx_action(), where it is known the IRQs are enabled. Expected gain is 14 cycles from avoiding reading and restoring CPU flags in a spin_lock_irqsave/restore operation, measured via a microbencmark kernel module[1] on CPU E5-1650 v4 @ 3.60GHz. Microbenchmark overhead of spin_lock+unlock: - spin_lock_unlock_irq cost: 34 cycles(tsc) 9.486 ns - spin_lock_unlock_irqsave cost: 48 cycles(tsc) 13.567 ns We don't expect to see a measurable packet performance gain, as skb_defer_free_flush() is called infrequently once per NIC device NAPI bulk cycle and conditionally only if SKBs have been deferred by other CPUs via skb_attempt_defer_free(). [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.cReviewed-by: NJacob Keller <jacob.e.keller@intel.com> Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/167421646327.1321776.7390743166998776914.stgit@firesoulSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Stanislav Fomichev 提交于
New flag BPF_F_XDP_DEV_BOUND_ONLY plus all the infra to have a way to associate a netdev with a BPF program at load time. netdevsim checks are dropped in favor of generic check in dev_xdp_attach. Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: NStanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230119221536.3349901-6-sdf@google.comSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>
-
由 Stanislav Fomichev 提交于
BPF offloading infra will be reused to implement bound-but-not-offloaded bpf programs. Rename existing helpers for clarity. No functional changes. Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Reviewed-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NStanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230119221536.3349901-3-sdf@google.comSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>
-
- 19 12月, 2022 1 次提交
-
-
由 Miaoqian Lin 提交于
unregister_netdevice_notifier_net() is used for unregister a notifier registered by register_netdevice_notifier_net(). Also s/into/from/. Signed-off-by: NMiaoqian Lin <linmq006@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 12月, 2022 1 次提交
-
-
由 Heiner Kallweit 提交于
Add a helper for drivers wanting to set SW IRQ coalescing by default. The related sysfs attributes can be used to override the default values. Follow Jakub's suggestion and put this functionality into net core so that drivers wanting to use software interrupt coalescing per default don't have to open-code it. Note that this function needs to be called before the netdevice is registered. Suggested-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 11月, 2022 1 次提交
-
-
由 Eric Dumazet 提交于
Dan reported a new warning after my recent patch: New smatch warnings: net/core/dev.c:6409 napi_disable() error: uninitialized symbol 'new'. Indeed, we must first wait for STATE_SCHED and STATE_NPSVC to be cleared, to make sure @new variable has been initialized properly. Fixes: 4ffa1d1c ("net: adopt try_cmpxchg() in napi_{enable|disable}()") Reported-by: Nkernel test robot <lkp@intel.com> Reported-by: NDan Carpenter <error27@gmail.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 11月, 2022 4 次提交
-
-
由 Eric Dumazet 提交于
Long standing KCSAN issues are caused by data-race around some dev->stats changes. Most performance critical paths already use per-cpu variables, or per-queue ones. It is reasonable (and more correct) to use atomic operations for the slow paths. This patch adds an union for each field of net_device_stats, so that we can convert paths that are not yet protected by a spinlock or a mutex. netdev_stats_to_stats64() no longer has an #if BITS_PER_LONG==64 Note that the memcpy() we were using on 64bit arches had no provision to avoid load-tearing, while atomic_long_read() is providing the needed protection at no cost. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
This makes code a bit cleaner. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
This makes the code slightly more efficient. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Adopting atomic_try_cmpxchg() makes the code cleaner. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 11月, 2022 1 次提交
-
-
由 Jiri Pirko 提交于
Currently, net_dev() netdev notifier variant follows the netdev with per-net notifier from namespace to namespace. This is implemented by move_netdevice_notifiers_dev_net() helper. For devlink it is needed to re-register per-net notifier during devlink reload. Introduce a new helper called move_netdevice_notifier_net() and share the unregister/register code with existing move_netdevice_notifiers_dev_net() helper. Signed-off-by: NJiri Pirko <jiri@nvidia.com> Reviewed-by: NIdo Schimmel <idosch@nvidia.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 09 11月, 2022 1 次提交
-
-
由 Andy Ren 提交于
Allow a network interface to be renamed when the interface is up. As described in the netconsole documentation [1], when netconsole is used as a built-in, it will bring up the specified interface as soon as possible. As a result, user space will not be able to rename the interface since the kernel disallows renaming of interfaces that are administratively up unless the 'IFF_LIVE_RENAME_OK' private flag was set by the kernel. The original solution [2] to this problem was to add a new parameter to the netconsole configuration parameters that allows renaming of the interface used by netconsole while it is administratively up. However, during the discussion that followed, it became apparent that we have no reason to keep the current restriction and instead we should allow user space to rename interfaces regardless of their administrative state: 1. The restriction was put in place over 20 years ago when renaming was only possible via IOCTL and before rtnetlink started notifying user space about such changes like it does today. 2. The 'IFF_LIVE_RENAME_OK' flag was added over 3 years ago in version 5.2 and no regressions were reported. 3. In-kernel listeners to 'NETDEV_CHANGENAME' do not seem to care about the administrative state of interface. Therefore, allow user space to rename running interfaces by removing the restriction and the associated 'IFF_LIVE_RENAME_OK' flag. Help in possible triage by emitting a message to the kernel log that an interface was renamed while UP. [1] https://www.kernel.org/doc/Documentation/networking/netconsole.rst [2] https://lore.kernel.org/netdev/20221102002420.2613004-1-andy.ren@getcruise.com/Signed-off-by: NAndy Ren <andy.ren@getcruise.com> Reviewed-by: NIdo Schimmel <idosch@nvidia.com> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 11月, 2022 1 次提交
-
-
由 Jiri Pirko 提交于
Currently, ethernet drivers are using devlink_port_type_eth_set() and devlink_port_type_clear() to set devlink port type and link to related netdev. Instead of calling them directly, let the driver use SET_NETDEV_DEVLINK_PORT macro to assign devlink_port pointer and let devlink to track it. Note the devlink port pointer is static during the time netdevice is registered. In devlink code, use per-namespace netdev notifier to track the netdevices with devlink_port assigned and change the internal devlink_port type and related type pointer accordingly. Signed-off-by: NJiri Pirko <jiri@nvidia.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 01 11月, 2022 2 次提交
-
-
由 Hangbin Liu 提交于
Add new helper unregister_netdevice_many_notify(), pass netlink message header and portid, which could be used to notify userspace when flag NLM_F_ECHO is set. Make the unregister_netdevice_many() as a wrapper of new function unregister_netdevice_many_notify(). Suggested-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Reviewed-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Hangbin Liu 提交于
This patch pass netlink message header and portid to rtnl_configure_link() All the functions in this call chain need to add the parameters so we can use them in the last call rtnl_notify(), and notify the userspace about the new link info if NLM_F_ECHO flag is set. - rtnl_configure_link() - __dev_notify_flags() - rtmsg_ifinfo() - rtmsg_ifinfo_event() - rtmsg_ifinfo_build_skb() - rtmsg_ifinfo_send() - rtnl_notify() Also move __dev_notify_flags() declaration to net/core/dev.h, as Jakub suggested. Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Reviewed-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 29 10月, 2022 1 次提交
-
-
由 Thomas Gleixner 提交于
Now that the 32bit UP oddity is gone and 32bit uses always a sequence count, there is no need for the fetch_irq() variants anymore. Convert to the regular interface. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 26 10月, 2022 1 次提交
-
-
由 Kees Cook 提交于
One of the worst offenders of "fake flexible arrays" is struct sockaddr, as it is the classic example of why GCC and Clang have been traditionally forced to treat all trailing arrays as fake flexible arrays: in the distant misty past, sa_data became too small, and code started just treating it as a flexible array, even though it was fixed-size. The special case by the compiler is specifically that sizeof(sa->sa_data) and FORTIFY_SOURCE (which uses __builtin_object_size(sa->sa_data, 1)) do not agree (14 and -1 respectively), which makes FORTIFY_SOURCE treat it as a flexible array. However, the coming -fstrict-flex-arrays compiler flag will remove these special cases so that FORTIFY_SOURCE can gain coverage over all the trailing arrays in the kernel that are _not_ supposed to be treated as a flexible array. To deal with this change, convert sa_data to a true flexible array. To keep the structure size the same, move sa_data into a union with a newly introduced sa_data_min with the original size. The result is that FORTIFY_SOURCE can continue to have no idea how large sa_data may actually be, but anything using sizeof(sa->sa_data) must switch to sizeof(sa->sa_data_min). Cc: Jens Axboe <axboe@kernel.dk> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: Dylan Yudaken <dylany@fb.com> Cc: Yajun Deng <yajun.deng@linux.dev> Cc: Petr Machata <petrm@nvidia.com> Cc: Hangbin Liu <liuhangbin@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: syzbot <syzkaller@googlegroups.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NKees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221018095503.never.671-kees@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 19 10月, 2022 1 次提交
-
-
由 Paul Blakey 提交于
Currently qdisc ingress handling (sch_handle_ingress()) doesn't set a return value and it is left to the old return value of the caller (__netif_receive_skb_core()) which is RX drop, so if the packet is consumed, caller will stop and return this value as if the packet was dropped. This causes a problem in the kernel tcp stack when having a egress tc rule forwarding to a ingress tc rule. The tcp stack sending packets on the device having the egress rule will see the packets as not successfully transmitted (although they actually were), will not advance it's internal state of sent data, and packets returning on such tcp stream will be dropped by the tcp stack with reason ack-of-unsent-data. See reproduction in [0] below. Fix that by setting the return value to RX success if the packet was handled successfully. [0] Reproduction steps: $ ip link add veth1 type veth peer name peer1 $ ip link add veth2 type veth peer name peer2 $ ifconfig peer1 5.5.5.6/24 up $ ip netns add ns0 $ ip link set dev peer2 netns ns0 $ ip netns exec ns0 ifconfig peer2 5.5.5.5/24 up $ ifconfig veth2 0 up $ ifconfig veth1 0 up #ingress forwarding veth1 <-> veth2 $ tc qdisc add dev veth2 ingress $ tc qdisc add dev veth1 ingress $ tc filter add dev veth2 ingress prio 1 proto all flower \ action mirred egress redirect dev veth1 $ tc filter add dev veth1 ingress prio 1 proto all flower \ action mirred egress redirect dev veth2 #steal packet from peer1 egress to veth2 ingress, bypassing the veth pipe $ tc qdisc add dev peer1 clsact $ tc filter add dev peer1 egress prio 20 proto ip flower \ action mirred ingress redirect dev veth1 #run iperf and see connection not running $ iperf3 -s& $ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1 #delete egress rule, and run again, now should work $ tc filter del dev peer1 egress $ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1 Fixes: f697c3e8 ("[NET]: Avoid unnecessary cloning for ingress filtering") Signed-off-by: NPaul Blakey <paulb@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 9月, 2022 1 次提交
-
-
由 Paolo Abeni 提交于
After commit 3226b158 ("net: avoid 32 x truesize under-estimation for tiny skbs") we are observing 10-20% regressions in performance tests with small packets. The perf trace points to high pressure on the slab allocator. This change tries to improve the allocation schema for small packets using an idea originally suggested by Eric: a new per CPU page frag is introduced and used in __napi_alloc_skb to cope with small allocation requests. To ensure that the above does not lead to excessive truesize underestimation, the frag size for small allocation is inflated to 1K and all the above is restricted to build with 4K page size. Note that we need to update accordingly the run-time check introduced with commit fd9ea57f ("net: add napi_get_frags_check() helper"). Alex suggested a smart page refcount schema to reduce the number of atomic operations and deal properly with pfmemalloc pages. Under small packet UDP flood, I measure a 15% peak tput increases. Suggested-by: NEric Dumazet <eric.dumazet@gmail.com> Suggested-by: NAlexander H Duyck <alexanderduyck@fb.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/6b6f65957c59f86a353fc09a5127e83a32ab5999.1664350652.git.pabeni@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 24 8月, 2022 5 次提交
-
-
由 Kuniyuki Iwashima 提交于
While reading netdev_unregister_timeout_secs, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 5aa3afe1 ("net: make unregister netdev warning timeout configurable") Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com> Acked-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kuniyuki Iwashima 提交于
While reading netdev_budget_usecs, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 7acf8a1e ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning") Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kuniyuki Iwashima 提交于
While reading netdev_budget, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 51b0bded ("[NET]: Separate two usages of netdev_max_backlog.") Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kuniyuki Iwashima 提交于
While reading netdev_tstamp_prequeue, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 3b098e2d ("net: Consistent skb timestamping") Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kuniyuki Iwashima 提交于
While reading netdev_max_backlog, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. While at it, we remove the unnecessary spaces in the doc. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-