- 30 3月, 2018 40 次提交
-
-
由 Kirill Tkhai 提交于
{un,}register_netdevice_notifier() iterate over all net namespaces hashed to net_namespace_list. But pernet_operations register and unregister netdevices in unhashed net namespace, and they are not seen for netdevice notifiers. This results in asymmetry: 1)Race with register_netdevice_notifier() pernet_operations::init(net) ... register_netdevice() ... call_netdevice_notifiers() ... ... nb is not called ... ... register_netdevice_notifier(nb) -> net skipped ... ... list_add_tail(&net->list, ..) ... Then, userspace stops using net, and it's destructed: pernet_operations::exit(net) unregister_netdevice() call_netdevice_notifiers() ... nb is called ... This always happens with net::loopback_dev, but it may be not the only device. 2)Race with unregister_netdevice_notifier() pernet_operations::init(net) register_netdevice() call_netdevice_notifiers() ... nb is called ... Then, userspace stops using net, and it's destructed: list_del_rcu(&net->list) ... pernet_operations::exit(net) unregister_netdevice_notifier(nb) -> net skipped dev_change_net_namespace() ... call_netdevice_notifiers() ... nb is not called ... unregister_netdevice() call_netdevice_notifiers() ... nb is not called ... This race is more danger, since dev_change_net_namespace() moves real network devices, which use not trivial netdevice notifiers, and if this will happen, the system will be left in unpredictable state. The patch closes the race. During the testing I found two places, where register_netdevice_notifier() is called from pernet init/exit methods (which led to deadlock) and fixed them (see previous patches). The review moved me to one more unusual registration place: raw_init() (can driver). It may be a reason of problems, if someone creates in-kernel CAN_RAW sockets, since they will be destroyed in exit method and raw_release() will call unregister_netdevice_notifier(). But grep over kernel tree does not show, someone creates such sockets from kernel space. Theoretically, there can be more places like this, and which are hidden from review, but we found them on the first bumping there (since there is no a race, it will be 100% reproducible). Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kirill Tkhai 提交于
Register netdevice notifier for every iptable entry is not good, since this breaks modularity, and the hidden synchronization is based on rtnl_lock(). This patch reworks the synchronization via new lock, while the rest of logic remains as it was before. This is required for the next patch. Tested via: while :; do unshare -n iptables -t mangle -A OUTPUT -j TEE --gateway 1.1.1.2 --oif lo; done Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Acked-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kirill Tkhai 提交于
Currently, driver registers it from pernet_operations::init method, and this breaks modularity, because initialization of net namespace and netdevice notifiers are orthogonal actions. We don't have per-namespace netdevice notifiers; all of them are global for all devices in all namespaces. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Mike Looijmans says: ==================== of_net: Implement of_get_nvmem_mac_address helper Posted this as a small set now, with an (optional) second patch that shows how the changes work and what I've used to test the code on a Topic Miami board. I've taken the liberty to add appropriate "Acked" and "Review" tags. v4: Replaced "6" with ETH_ALEN v3: Add patch that implements mac in nvmem for the Cadence MACB controller Remove the integrated of_get_mac_address call v2: Use of_nvmem_cell_get to avoid needing the assiciated device Use void* instead of char* Add devicetree binding doc ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Mike Looijmans 提交于
Call of_get_nvmem_mac_address() to fetch the MAC address from an nvmem cell, if one is provided in the device tree. This allows the address to be stored in an I2C EEPROM device for example. Signed-off-by: NMike Looijmans <mike.looijmans@topic.nl> Acked-by: NNicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Mike Looijmans 提交于
It's common practice to store MAC addresses for network interfaces into nvmem devices. However the code to actually do this in the kernel lacks, so this patch adds of_get_nvmem_mac_address() for drivers to obtain the address from an nvmem cell provider. This is particulary useful on devices where the ethernet interface cannot be configured by the bootloader, for example because it's in an FPGA. Signed-off-by: NMike Looijmans <mike.looijmans@topic.nl> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Jakub Kicinski says: ==================== nfp: flower: handle MTU changes This set improves MTU handling for flower offload. The max MTU is correctly capped and physical port MTU is communicated to the FW (and indirectly HW). ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Hurley 提交于
Trigger a port mod message to request an MTU change on the NIC when any physical port representor is assigned a new MTU value. The driver waits 10 msec for an ack that the FW has set the MTU. If no ack is received the request is rejected and an appropriate warning flagged. Rather than maintain an MTU queue per repr, one is maintained per app. Because the MTU ndo is protected by the rtnl lock, there can never be contention here. Portmod messages from the NIC are also protected by rtnl so we first check if the portmod is an ack and, if so, handle outside rtnl and the cmsg work queue. Acks are detected by the marking of a bit in a portmod response. They are then verfied by checking the port number and MTU value expected by the app. If the expected MTU is 0 then no acks are currently expected. Also, ensure that the packet headroom reserved by the flower firmware is considered when accepting an MTU change on any repr. Signed-off-by: NJohn Hurley <john.hurley@netronome.com> Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Hurley 提交于
Rename the 'change_mtu' app callback to 'check_mtu'. This is called whenever an MTU change is requested on a netdev. It can reject the change but is not responsible for implementing it. Introduce a new 'repr_change_mtu' app callback that is hit when the MTU of a repr is to be changed. This is responsible for performing the MTU change and verifying it. Signed-off-by: NJohn Hurley <john.hurley@netronome.com> Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Florian Fainelli says: ==================== phylink: API changes This patch series contains two API changes to PHYLINK which will later be used by DSA to migrate to PHYLINK. Because these are API changes that impact other outstanding work (e.g: MVPP2) I would rather get them included sooner to minimize conflicts. Thank you! Changes in v2: - added missing documentation to mac_link_{up,down} that the interface must be configured in mac_config() - added Russell's, Andrew's and my tags ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Russell King 提交于
Provide a pointer to the SFP bus in struct net_device, so that the ethtool module EEPROM methods can access the SFP directly, rather than needing every user to provide a hook for it. Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Fainelli 提交于
In preparation for having DSA transition entirely to PHYLINK, we need to pass a PHY interface type to the mac_link_{up,down} callbacks because we may have to make decisions on that (e.g: turn on/off RGMII interfaces etc.). We do not pass an entire phylink_link_state because not all parameters (pause, duplex etc.) are defined when the link is down, only link and interface are. Update mvneta accordingly since it currently implements phylink_mac_ops. Acked-by: NRussell King <rmk+kernel@armlinux.org.uk> Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Acked-by: NRussell King <rmk+kernel@armlinux.org.uk> Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ronak Doshi 提交于
Shrikrishna Khare would no longer maintain the vmxnet3 driver. Taking over the role of vmxnet3 maintainer. Signed-off-by: NRonak Doshi <doshir@vmware.com> Signed-off-by: NShrikrishna Khare <skhare@vmware.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Florian Fainelli says: ==================== net: Broadcom drivers coalescing fixes Following Tal's review of the adaptive RX/TX coalescing feature added to the SYSTEMPORT and GENET driver a number of things showed up: - adaptive TX coalescing is not actually a good idea with the current way the estimator will program the ring, this results in a higher CPU load, NAPI on TX already does a reasonably good job at maintaining the interrupt count low - both SYSTEMPORT and GENET would suffer from the same issues while configuring coalescing parameters where the values would just not be applied correctly based on user settings, so we fix that too Tal, thanks again for your feedback, I would appreciate if you could review that the new behavior appears to be implemented correctly. Thanks! Changes in v2: - added Tal's reviewed-by to the first patch - split DIM initialization from coalescing parameters initialization - avoid duplicating the same code in bcmgenet_set_coalesce() when configuring RX rings - fixed the condition where default DIM parameters would be applied when adaptive RX coalescing would be enabled, do this only if it was disabled before ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Fainelli 提交于
There were a number of issues with setting the RX coalescing parameters: - we would not be preserving values that would have been configured across close/open calls, instead we would always reset to no timeout and 1 interrupt per packet, this would also prevent DIM from setting its default usec/pkts values - when adaptive RX would be turned on, we woud not be fetching the default parameters, we would stay with no timeout/1 packet per interrupt until the estimator kicks in and changes that - finally disabling adaptive RX coalescing while providing parameters would not be honored, and we would stay with whatever DIM had previously determined instead of the user requested parameters Fixes: 9f4ca058 ("net: bcmgenet: Add support for adaptive RX coalescing") Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NTal Gilboa <talgi@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Fainelli 提交于
There were a number of issues with setting the RX coalescing parameters: - we would not be preserving values that would have been configured across close/open calls, instead we would always reset to no timeout and 1 interrupt per packet, this would also prevent DIM from setting its default usec/pkts values - when adaptive RX would be turned on, we woud not be fetching the default parameters, we would stay with no timeout/1 packet per interrupt until the estimator kicks in and changes that - finally disabling adaptive RX coalescing while providing parameters would not be honored, and we would stay with whatever DIM had previously determined instead of the user requested parameters Fixes: b6e0e875 ("net: systemport: Implement adaptive interrupt coalescing") Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NTal Gilboa <talgi@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Fainelli 提交于
Adaptive TX coalescing is not currently giving us any advantages and ends up making the CPU spin more frequently until TX completion. Deny and disable adaptive TX coalescing for now and rely on static configuration, we can always add it back later. Reviewed-by: NTal Gilboa <talgi@mellanox.com> Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gal Pressman 提交于
NETIF_F_HW_VLAN_[CS]TAG_FILTER features require more than just a bit flip in dev->features in order to keep the driver in a consistent state. These features notify the driver of each added/removed vlan, but toggling of vlan-filter does not notify the driver accordingly for each of the existing vlans. This patch implements a similar solution to NETIF_F_RX_UDP_TUNNEL_PORT behavior (which notifies the driver about UDP ports in the same manner that vids are reported). Each toggling of the features propagates to the 8021q module, which iterates over the vlans and call add/kill ndo accordingly. Signed-off-by: NGal Pressman <galp@mellanox.com> Reviewed-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Wei Yongjun 提交于
Fix to return a negative error code from the hash filter init error handling case instead of 0, as done elsewhere in this function. Fixes: 5c31254e ("cxgb4: initialize hash-filter configuration") Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Merge tag 'wireless-drivers-next-for-davem-2018-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.17 Smaller new features to various drivers but nothing really out of ordinary. Major changes: ath10k * enable chip temperature measurement for QCA6174/QCA9377 * add firmware memory dump for QCA9984 * enable buffer STA on TDLS link for QCA6174 * support different beacon internals in multiple interface scenario for QCA988X/QCA99X0/QCA9984/QCA4019 iwlwifi * support for new PCI IDs for the 9000 family * support for a new firmware API version * support for advanced dwell and Optimized Connectivity Experience (OCE) in scanning btrsi * fix kconfig dependencies wil6210 * support multiple virtual interfaces ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Merge tag 'mac80211-next-for-davem-2018-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== We have a fair number of patches, but many of them are from the first bullet here: * EAPoL-over-nl80211 from Denis - this will let us fix some long-standing issues with bridging, races with encryption and more * DFS offload support from the qtnfmac folks * regulatory database changes for the new ETSI adaptivity requirements * various other fixes and small enhancements ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Andrew Lunn says: ==================== Add ATU/VTU statistics Previous patches have added basic support for Address Translation Unit and VLAN translation Unit violation interrupts. Add statistics counters for when these occur, which can be accessed using ethtool. Downgrade one of the particularly spammy warnings from VTU violations to debug only, now that we have a counter for it. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andrew Lunn 提交于
VTU miss violations can happen under normal conditions. Don't spam the kernel log, downgrade the output to debug level only. The statistics counter will indicate it is happening, if anybody not debugging is interested. Signed-off-by: NAndrew Lunn <andrew@lunn.ch> Reported-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andrew Lunn 提交于
Count the numbers of various ATU and VTU violation statistics and return them as part of the ethtool -S statistics. Signed-off-by: NAndrew Lunn <andrew@lunn.ch> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arnd Bergmann 提交于
The proc file cleanup left a label possibly unused: net/sctp/protocol.c: In function 'sctp_defaults_init': net/sctp/protocol.c:1304:1: error: label 'err_init_proc' defined but not used [-Werror=unused-label] This adds an #ifdef around it to match the respective 'goto'. Fixes: d47d08c8 ("sctp: use proc_remove_subtree()") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Wei Yongjun 提交于
Use the module_pci_driver() macro to make the code simpler by eliminating module_init and module_exit calls. Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Wei Yongjun 提交于
Fixes the following sparse warning: drivers/net/ethernet/broadcom/genet/bcmgenet.c:1351:16: warning: Using plain integer as NULL pointer Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Acked-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dan Carpenter 提交于
The skb_segment() function returns error pointers on error. It never returns NULL. Fixes: 76db8087 ("net: bpf: add a test for skb_segment in test_bpf module") Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NYonghong Song <yhs@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Russell King 提交于
Cotsworks modules fail the checksums - it appears that Cotsworks reprograms the EEPROM at the end of production with the final product information (serial, date code, and exact part number for module options) and fails to update the checksum. Work around this by detecting the Cotsworks name in the manufacturer field, and reducing the checksum failures to warnings rather than a hard error. Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Sudarsana Reddy Kalluru says: ==================== qed*: Flash upgrade support. The patch series adds adapter flash upgrade support for qed/qede drivers. Please consider applying it to net-next branch. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sudarsana Reddy Kalluru 提交于
The patch adds ethtool callback implementation for flash update. Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: NAriel Elior <ariel.elior@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sudarsana Reddy Kalluru 提交于
This patch adds the required driver support for updating the flash or non volatile memory of the adapter. At highlevel, flash upgrade comprises of reading the flash images from the input file, validating the images and writing them to the respective paritions. Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: NAriel Elior <ariel.elior@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sudarsana Reddy Kalluru 提交于
This patch adds APIs for flash access. Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: NAriel Elior <ariel.elior@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sudarsana Reddy Kalluru 提交于
Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: NAriel Elior <ariel.elior@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sudarsana Reddy Kalluru 提交于
This patch adds support for populating the flash image attributes. Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: NAriel Elior <ariel.elior@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michal Kalderon 提交于
This FW contains several fixes and features RDMA Features - SRQ support - XRC support - Memory window support - RDMA low latency queue support - RDMA bonding support RDMA bug fixes - RDMA remote invalidate during retransmit fix - iWARP MPA connect interop issue with RTR fix - iWARP Legacy DPM support - Fix MPA reject flow - iWARP error handling - RQ WQE validation checks MISC - Fix some HSI types endianity - New Restriction: vlan insertion in core_tx_bd_data can't be set for LB packets ETH - HW QoS offload support - Fix vlan, dcb and sriov flow of VF sending a packet with inband VLAN tag instead of default VLAN - Allow GRE version 1 offloads in RX flow - Allow VXLAN steering iSCSI / FcoE - Fix bd availability checking flow - Support 256th sge proerly in iscsi/fcoe retransmit - Performance improvement - Fix handle iSCSI command arrival with AHS and with immediate - Fix ipv6 traffic class configuration DEBUG - Update debug utilities Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: NTomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: NManish Rangankar <Manish.Rangankar@cavium.com> Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com> Acked-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
IPv4 was changed in commit 52a773d6 ("net: Export ip fragment sysctl to unprivileged users") The only sysctl that is not per-netns is not used : ip6frag_secret_interval Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Nikolay Borisov <kernel@kyup.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Intiyaz Basha 提交于
During heavy tx traffic, control messages (sent by liquidio driver to NIC firmware) sometimes do not get processed in a timely manner. Reason is: the low-level metadata of control messages and that of egress network packets indicate that they have the same priority. Fix it by setting a higher priority for control messages through the new ctrl_qpg field in the oct_txpciq struct. It is the NIC firmware that does the actual setting of priority by writing to the new ctrl_qpg field; the host driver treats that value as opaque and just assigns it to pki_ih3->qpg Signed-off-by: NIntiyaz Basha <intiyaz.basha@cavium.com> Signed-off-by: NFelix Manlunas <felix.manlunas@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
David Ahern says: ==================== net: Allow FIB notifiers to fail add and replace I wanted to revisit how resource overload is handled for hardware offload of FIB entries and rules. At the moment, the in-kernel fib notifier can tell a driver about a route or rule add, replace, and delete, but the notifier can not affect the action. Specifically, in the case of mlxsw if a route or rule add is going to overflow the ASIC resources the only recourse is to abort hardware offload. Aborting offload is akin to taking down the switch as the path from data plane to the control plane simply can not support the traffic bandwidth of the front panel ports. Further, the current state of FIB notifiers is inconsistent with other resources where a driver can affect a user request - e.g., enslavement of a port into a bridge or a VRF. As a result of the work done over the past 3+ years, I believe we are at a point where we can bring consistency to the stack and offloads, and reliably allow the FIB notifiers to fail a request, pushing an error along with a suitable error message back to the user. Rather than aborting offload when the switch is out of resources, userspace is simply prevented from adding more routes and has a clear indication of why. This set does not resolve the corner case where rules or routes not supported by the device are installed prior to the driver getting loaded and registering for FIB notifications. In that case, hardware offload has not been established and it can refuse to offload anything, sending errors back to userspace via extack. Since conceptually the driver owns the netdevices associated with its asic, this corner case mainly applies to unsupported rules and any races during the bringup phase. Patch 1 fixes call_fib_notifiers to extract the errno from the encoded response from handlers. Patches 2-5 allow the call to call_fib_notifiers to fail the add or replace of a route or rule. Patch 6 adds a simple resource controller to netdevsim to illustrate how a FIB resource controller can limit the number of route entries. Changes since RFC - correct return code for call_fib_notifier - dropped patch 6 exporting devlink symbols - limited example resource controller to init_net only - updated Kconfig for netdevsim to use MAY_USE_DEVLINK - updated cover letter regarding startup case noted by Ido ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Add devlink support to netdevsim and use it to implement a simple, profile based resource controller. Only one controller is needed per namespace, so the first netdevsim netdevice in a namespace registers with devlink. If that device is deleted, the resource settings are deleted. The resource controller allows a user to limit the number of IPv4 and IPv6 FIB entries and FIB rules. The resource paths are: /IPv4 /IPv4/fib /IPv4/fib-rules /IPv6 /IPv6/fib /IPv6/fib-rules The IPv4 and IPv6 top level resources are unlimited in size and can not be changed. From there, the number of FIB entries and FIB rule entries are unlimited by default. A user can specify a limit for the fib and fib-rules resources: $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib size 96 $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib-rules size 16 $ devlink resource set netdevsim/netdevsim0 path /IPv6/fib size 64 $ devlink resource set netdevsim/netdevsim0 path /IPv6/fib-rules size 16 $ devlink dev reload netdevsim/netdevsim0 such that the number of rules or routes is limited (96 ipv4 routes in the example above): $ for n in $(seq 1 32); do ip ro add 10.99.$n.0/24 dev eth1; done Error: netdevsim: Exceeded number of supported fib entries. $ devlink resource show netdevsim/netdevsim0 netdevsim/netdevsim0: name IPv4 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables non resources: name fib size 96 occ 96 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables ... With this template in place for resource management, it is fairly trivial to extend and shows one way to implement a simple counter based resource controller typical of network profiles. Currently, devlink only supports initial namespace. Code is in place to adapt netdevsim to a per namespace controller once the network namespace issues are resolved. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-