- 18 7月, 2017 3 次提交
-
-
由 David S. Miller 提交于
No longer used. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
For performance reasons we want to avoid updating the tail pointer in the driver tx ring as much as possible. To accomplish this we add batching support to the redirect path in XDP. This adds another ndo op "xdp_flush" that is used to inform the driver that it should bump the tail pointer on the TX ring. Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
This adds support for a bpf_redirect helper function to the XDP infrastructure. For now this only supports redirecting to the egress path of a port. In order to support drivers handling a xdp_buff natively this patches uses a new ndo operation ndo_xdp_xmit() that takes pushes a xdp_buff to the specified device. If the program specifies either (a) an unknown device or (b) a device that does not support the operation a BPF warning is thrown and the XDP_ABORTED error code is returned. Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 7月, 2017 1 次提交
-
-
由 WANG Cong 提交于
As Hongjun/Nicolas summarized in their original patch: " When a device changes from one netns to another, it's first unregistered, then the netns reference is updated and the dev is registered in the new netns. Thus, when a slave moves to another netns, it is first unregistered. This triggers a NETDEV_UNREGISTER event which is caught by the bonding driver. The driver calls bond_release(), which calls dev_set_mtu() and thus triggers NETDEV_CHANGEMTU (the device is still in the old netns). " This is a very special case, because the device is being unregistered no one should still care about the NETDEV_CHANGEMTU event triggered at this point, we can avoid broadcasting this event on this path, and avoid touching inetdev_event()/addrconf_notify() path. It requires to export __dev_set_mtu() to bonding driver. Reported-by: NHongjun Li <hongjun.li@6wind.com> Reported-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 6月, 2017 1 次提交
-
-
由 Mintz, Yuval 提交于
Looks like commit f663dd9a ("net: core: explicitly select a txq before doing l2 forwarding") has removed the need for this dedicated xmit function [it even explicitly states so in its commit log message] but it hasn't removed the definition of the ndo. Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com> CC: Jason Wang <jasowang@redhat.com> CC: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 6月, 2017 3 次提交
-
-
由 Jakub Kicinski 提交于
Extend the XDP_ATTACHED_* values to include offloaded mode. Let drivers report whether program is installed in the driver or the HW by changing the prog_attached field from bool to u8 (type of the netlink attribute). Exploit the fact that the value of XDP_ATTACHED_DRV is 1, therefore since all drivers currently assign the mode with double negation: mode = !!xdp_prog; no drivers have to be modified. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Add an installation-time flag for requesting that the program be installed only if it can be offloaded to HW. Internally new command for ndo_xdp is added, this way we avoid putting checks into drivers since they all return -EINVAL on an unknown command. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Pass XDP flags to the xdp ndo. This will allow drivers to look at the mode flags and make decisions about offload. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 6月, 2017 1 次提交
-
-
由 Martin KaFai Lau 提交于
Expose prog_id through IFLA_XDP_PROG_ID. This patch makes modification to generic_xdp. The later patches will modify other xdp-supported drivers. prog_id is added to struct net_dev_xdp. iproute2 patch will be followed. Here is how the 'ip link' will look like: > ip link show eth0 3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp(prog_id:1) qdisc fq_codel state UP mode DEFAULT group default qlen 1000 Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@fb.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 6月, 2017 1 次提交
-
-
由 Magnus Damm 提交于
Update ->ndo_change_mtu() callback comment to remove text about returning error in case of undefined callback. This change makes the comment match the existing code behavior. Signed-off-by: NMagnus Damm <damm+renesas@opensource.se> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 6月, 2017 3 次提交
-
-
由 David Ahern 提交于
Roopa reported attempts to delete a bond device that is referenced in a multipath route is hanging: $ ifdown bond2 # ifupdown2 command that deletes virtual devices unregister_netdevice: waiting for bond2 to become free. Usage count = 2 Steps to reproduce: echo 1 > /proc/sys/net/ipv6/conf/all/ignore_routes_with_linkdown ip link add dev bond12 type bond ip link add dev bond13 type bond ip addr add 2001:db8:2::0/64 dev bond12 ip addr add 2001:db8:3::0/64 dev bond13 ip route add 2001:db8:33::0/64 nexthop via 2001:db8:2::2 nexthop via 2001:db8:3::2 ip link del dev bond12 ip link del dev bond13 The root cause is the recent change to keep routes on a linkdown. Update the check to detect when the device is unregistering and release the route for that case. Fixes: a1a22c12 ("net: ipv6: Keep nexthop of multipath route on admin down") Reported-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
We need to push the chain index down to the drivers, so they have the information to which chain the rule belongs. For now, no driver supports multichain offload, so only chain 0 is supported. This is needed to prevent chain squashes during offload for now. Later this will be used to implement multichain offload. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Network devices can allocate reasources and private memory using netdev_ops->ndo_init(). However, the release of these resources can occur in one of two different places. Either netdev_ops->ndo_uninit() or netdev->destructor(). The decision of which operation frees the resources depends upon whether it is necessary for all netdev refs to be released before it is safe to perform the freeing. netdev_ops->ndo_uninit() presumably can occur right after the NETDEV_UNREGISTER notifier completes and the unicast and multicast address lists are flushed. netdev->destructor(), on the other hand, does not run until the netdev references all go away. Further complicating the situation is that netdev->destructor() almost universally does also a free_netdev(). This creates a problem for the logic in register_netdevice(). Because all callers of register_netdevice() manage the freeing of the netdev, and invoke free_netdev(dev) if register_netdevice() fails. If netdev_ops->ndo_init() succeeds, but something else fails inside of register_netdevice(), it does call ndo_ops->ndo_uninit(). But it is not able to invoke netdev->destructor(). This is because netdev->destructor() will do a free_netdev() and then the caller of register_netdevice() will do the same. However, this means that the resources that would normally be released by netdev->destructor() will not be. Over the years drivers have added local hacks to deal with this, by invoking their destructor parts by hand when register_netdevice() fails. Many drivers do not try to deal with this, and instead we have leaks. Let's close this hole by formalizing the distinction between what private things need to be freed up by netdev->destructor() and whether the driver needs unregister_netdevice() to perform the free_netdev(). netdev->priv_destructor() performs all actions to free up the private resources that used to be freed by netdev->destructor(), except for free_netdev(). netdev->needs_free_netdev is a boolean that indicates whether free_netdev() should be done at the end of unregister_netdevice(). Now, register_netdevice() can sanely release all resources after ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit() and netdev->priv_destructor(). And at the end of unregister_netdevice(), we invoke netdev->priv_destructor() and optionally call free_netdev(). Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 5月, 2017 1 次提交
-
-
由 Miroslav Lichvar 提交于
Since commit b6858177 ("net: Make skb->skb_iif always track skb->dev") skbs don't have the original index of the interface which received the packet. This information is now needed for a new control message related to hardware timestamping. Instead of adding a new field to skb, we can find the device by the NAPI ID if it is available, i.e. CONFIG_NET_RX_BUSY_POLL is enabled and the driver is using NAPI. Add dev_get_by_napi_id() and also skb_napi_id() to hide the CONFIG_NET_RX_BUSY_POLL ifdef. CC: Richard Cochran <richardcochran@gmail.com> Suggested-by: NWillem de Bruijn <willemb@google.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NMiroslav Lichvar <mlichvar@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 5月, 2017 3 次提交
-
-
由 Davide Caratti 提交于
skb_csum_hwoffload_help() uses netdev features and skb->csum_not_inet to determine if skb needs software computation of Internet Checksum or crc32c (or nothing, if this computation can be done by the hardware). Use it in place of skb_checksum_help() in validate_xmit_skb() to avoid corruption of non-GSO SCTP packets having skb->ip_summed equal to CHECKSUM_PARTIAL. While at it, remove references to skb_csum_off_chk* functions, since they are not present anymore in Linux _ see commit cf53b1da ("Revert "net: Add driver helper functions to determine checksum offloadability""). Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Davide Caratti 提交于
This bit was introduced with commit 5a212329 ("net: Support for csum_bad in skbuff") to reduce the stack workload when processing RX packets carrying a wrong Internet Checksum. Up to now, only one driver and GRO core are setting it. Suggested-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Davide Caratti 提交于
skb_crc32c_csum_help is like skb_checksum_help, but it is designed for checksumming SCTP packets using crc32c (see RFC3309), provided that libcrc32c.ko has been loaded before. In case libcrc32c is not loaded, invoking skb_crc32c_csum_help on a skb results in one the following printouts: warn_crc32c_csum_update: attempt to compute crc32c without libcrc32c.ko warn_crc32c_csum_combine: attempt to compute crc32c without libcrc32c.ko Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 5月, 2017 1 次提交
-
-
由 Alexey Dobriyan 提交于
4 billion packet queue is something unthinkable so use 32-bit value for now. Space savings on x86_64: add/remove: 0/0 grow/shrink: 3/70 up/down: 16/-131 (-115) function old new delta change_tx_queue_len 94 108 +14 qdisc_create 1176 1177 +1 alloc_netdev_mqs 1124 1125 +1 xenvif_alloc 533 532 -1 x25_asy_setup 167 166 -1 ... tun_queue_resize 945 940 -5 pfifo_fast_enqueue 167 162 -5 qfq_init_qdisc 168 158 -10 tap_queue_resize 810 799 -11 transmit 719 698 -21 Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 5月, 2017 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
Sphinx is very pedantic with regards to identation and escape sequences: ./include/net/sock.h:1967: ERROR: Unexpected indentation. ./include/net/sock.h:1969: ERROR: Unexpected indentation. ./include/net/sock.h:1970: WARNING: Block quote ends without a blank line; unexpected unindent. ./include/net/sock.h:1971: WARNING: Block quote ends without a blank line; unexpected unindent. ./include/net/sock.h:2268: WARNING: Inline emphasis start-string without end-string. ./net/core/sock.c:2686: ERROR: Unexpected indentation. ./net/core/sock.c:2687: WARNING: Block quote ends without a blank line; unexpected unindent. ./net/core/datagram.c:182: WARNING: Inline emphasis start-string without end-string. ./include/linux/netdevice.h:1444: ERROR: Unexpected indentation. ./drivers/net/phy/phy.c:381: ERROR: Unexpected indentation. ./drivers/net/phy/phy.c:382: WARNING: Block quote ends without a blank line; unexpected unindent. - Fix spacing where needed; - Properly escape constants; - Use a literal block for a race description. No functional changes. Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>
-
- 12 5月, 2017 1 次提交
-
-
由 Daniel Borkmann 提交于
While working on the iproute2 generic XDP frontend, I noticed that as of right now it's possible to have native *and* generic XDP programs loaded both at the same time for the case when a driver supports native XDP. The intended model for generic XDP from b5cdae32 ("net: Generic XDP") is, however, that only one out of the two can be present at once which is also indicated as such in the XDP netlink dump part. The main rationale for generic XDP is to ease accessibility (in case a driver does not yet have XDP support) and to generically provide a semantical model as an example for driver developers wanting to add XDP support. The generic XDP option for an XDP aware driver can still be useful for comparing and testing both implementations. However, it is not intended to have a second XDP processing stage or layer with exactly the same functionality of the first native stage. Only reason could be to have a partial fallback for future XDP features that are not supported yet in the native implementation and we probably also shouldn't strive for such fallback and instead encourage native feature support in the first place. Given there's currently no such fallback issue or use case, lets not go there yet if we don't need to. Therefore, change semantics for loading XDP and bail out if the user tries to load a generic XDP program when a native one is present and vice versa. Another alternative to bailing out would be to handle the transition from one flavor to another gracefully, but that would require to bring the device down, exchange both types of programs, and bring it up again in order to avoid a tiny window where a packet could hit both hooks. Given this complicates the logic for just a debugging feature in the native case, I went with the simpler variant. For the dump, remove IFLA_XDP_FLAGS that was added with b5cdae32 and reuse IFLA_XDP_ATTACHED for indicating the mode. Dumping all or just a subset of flags that were used for loading the XDP prog is suboptimal in the long run since not all flags are useful for dumping and if we start to reuse the same flag definitions for load and dump, then we'll waste bit space. What we really just want is to dump the mode for now. Current IFLA_XDP_ATTACHED semantics are: nothing was installed (0), a program is running at the native driver layer (1). Thus, add a mode that says that a program is running at generic XDP layer (2). Applications will handle this fine in that older binaries will just indicate that something is attached at XDP layer, effectively this is similar to IFLA_XDP_FLAGS attr that we would have had modulo the redundancy. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 5月, 2017 1 次提交
-
-
由 Jakub Kicinski 提交于
Drivers usually have a number of restrictions for running XDP - most common being buffer sizes, LRO and number of rings. Even though some drivers try to be helpful and print error messages experience shows that users don't often consult kernel logs on netlink errors. Try to use the new extended ack mechanism to carry the message back to user space. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 4月, 2017 1 次提交
-
-
由 Zhang Shengju 提交于
This patch updates the comment for netif_dormant() function to reflect the intended usage. Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 4月, 2017 2 次提交
-
-
由 Eric Dumazet 提交于
(struct net_device, xdp_prog) field should be moved in RX cache lines, reducing latencies when a single packet is received on idle host, since netif_elide_gro() needs it. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This provides a generic SKB based non-optimized XDP path which is used if either the driver lacks a specific XDP implementation, or the user requests it via a new IFLA_XDP_FLAGS value named XDP_FLAGS_SKB_MODE. It is arguable that perhaps I should have required something like this as part of the initial XDP feature merge. I believe this is critical for two reasons: 1) Accessibility. More people can play with XDP with less dependencies. Yes I know we have XDP support in virtio_net, but that just creates another depedency for learning how to use this facility. I wrote this to make life easier for the XDP newbies. 2) As a model for what the expected semantics are. If there is a pure generic core implementation, it serves as a semantic example for driver folks adding XDP support. One thing I have not tried to address here is the issue of XDP_PACKET_HEADROOM, thanks to Daniel for spotting that. It seems incredibly expensive to do a skb_cow(skb, XDP_PACKET_HEADROOM) or whatever even if the XDP program doesn't try to push headers at all. I think we really need the verifier to somehow propagate whether certain XDP helpers are used or not. v5: - Handle both negative and positive offset after running prog - Fix mac length in XDP_TX case (Alexei) - Use rcu_dereference_protected() in free_netdev (kbuild test robot) v4: - Fix MAC header adjustmnet before calling prog (David Ahern) - Disable LRO when generic XDP is installed (Michael Chan) - Bypass qdisc et al. on XDP_TX and record the event (Alexei) - Do not perform generic XDP on reinjected packets (DaveM) v3: - Make sure XDP program sees packet at MAC header, push back MAC header if we do XDP_TX. (Alexei) - Elide GRO when generic XDP is in use. (Alexei) - Add XDP_FLAG_SKB_MODE flag which the user can use to request generic XDP even if the driver has an XDP implementation. (Alexei) - Report whether SKB mode is in use in rtnl_xdp_fill() via XDP_FLAGS attribute. (Daniel) v2: - Add some "fall through" comments in switch statements based upon feedback from Andrew Lunn - Use RCU for generic xdp_prog, thanks to Johannes Berg. Tested-by: NAndy Gospodarek <andy@greyhouse.net> Tested-by: NJesper Dangaard Brouer <brouer@redhat.com> Tested-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 4月, 2017 1 次提交
-
-
由 Matthew Whitehead 提交于
Constants used for tuning are generally a bad idea, especially as hardware changes over time. Replace the constant 2 jiffies with sysctl variable netdev_budget_usecs to enable sysadmins to tune the softirq processing. Also document the variable. For example, a very fast machine might tune this to 1000 microseconds, while my regression testing 486DX-25 needs it to be 4000 microseconds on a nearly idle network to prevent time_squeeze from being incremented. Version 2: changed jiffies to microseconds for predictable units. Signed-off-by: NMatthew Whitehead <tedheadster@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 4月, 2017 1 次提交
-
-
由 Jiri Pirko 提交于
To find out if a netdev is an OVS port. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Reviewed-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 4月, 2017 2 次提交
-
-
由 Steffen Klassert 提交于
This patch adds all the bits that are needed to do IPsec hardware offload for IPsec states and ESP packets. We add xfrmdev_ops to the net_device. xfrmdev_ops has function pointers that are needed to manage the xfrm states in the hardware and to do a per packet offloading decision. Joint work with: Ilan Tayari <ilant@mellanox.com> Guy Shapiro <guysh@mellanox.com> Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: NGuy Shapiro <guysh@mellanox.com> Signed-off-by: NIlan Tayari <ilant@mellanox.com> Signed-off-by: NYossi Kuperman <yossiku@mellanox.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Steffen Klassert 提交于
This patch adds netdev features to configure IPsec offloads. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 13 4月, 2017 2 次提交
-
-
由 Alexey Dobriyan 提交于
This field is never big enough to warrant 16-bitness. 8-bit accesses enjoy shorted encoding on i386/x86_64 than 16-bit accesses: add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-10 (-10) function old new delta loopback_setup 169 164 -5 ether_setup 148 143 -5 Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexey Dobriyan 提交于
Using 16-bit ->hh_len doesn't save any memory, save some .text instead: add/remove: 0/0 grow/shrink: 1/6 up/down: 2/-19 (-17) function old new delta neigh_update 2312 2314 +2 fwnet_header_cache 199 197 -2 eth_header_cache 101 99 -2 ip6_finish_output2 2371 2368 -3 vrf_finish_output6 1522 1518 -4 vrf_finish_output 1413 1409 -4 ip_finish_output2 1627 1623 -4 Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 3月, 2017 1 次提交
-
-
由 Andrew Lunn 提交于
There is an include loop between netdevice.h, dsa.h, devlink.h because of NETDEV_ALIGN, making it impossible to use devlink structures in dsa.h. Break this loop by taking dsa.h out of netdevice.h, add a forward declaration of dsa_switch_tree and netdev_set_default_ethtool_ops() function, which is what netdevice.h requires. No longer having dsa.h in netdevice.h means the includes in dsa.h no longer get included. This breaks a few other files which depend on these includes. Add these directly in the affected file. Signed-off-by: NAndrew Lunn <andrew@lunn.ch> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 3月, 2017 1 次提交
-
-
由 Amritha Nambiar 提交于
The configurable priority to traffic class mapping and the user specified queue ranges are used to configure the traffic class, overriding the hardware defaults when the 'hw' option is set to 0. However, when the 'hw' option is non-zero, the hardware QOS defaults are used. This patch makes it so that we can pass the data the user provided to ndo_setup_tc. This allows us to pull in the queue configuration if the user requested it as well as any additional hardware offload type requested by using a value other than 1 for the hw value. Finally it also provides a means for the device driver to return the level supported for the offload type via the qopt->hw value. Previously we were just always assuming the value to be 1, in the future values beyond just 1 may be supported. Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 3月, 2017 1 次提交
-
-
由 Eric Dumazet 提交于
While playing with mlx4 hardware timestamping of RX packets, I found that some packets were received by TCP stack with a ~200 ms delay... Since the timestamp was provided by the NIC, and my probe was added in tcp_v4_rcv() while in BH handler, I was confident it was not a sender issue, or a drop in the network. This would happen with a very low probability, but hurting RPC workloads. A NAPI driver normally arms the IRQ after the napi_complete_done(), after NAPI_STATE_SCHED is cleared, so that the hard irq handler can grab it. Problem is that if another point in the stack grabs NAPI_STATE_SCHED bit while IRQ are not disabled, we might have later an IRQ firing and finding this bit set, right before napi_complete_done() clears it. This can happen with busy polling users, or if gro_flush_timeout is used. But some other uses of napi_schedule() in drivers can cause this as well. thread 1 thread 2 (could be on same cpu, or not) // busy polling or napi_watchdog() napi_schedule(); ... napi->poll() device polling: read 2 packets from ring buffer Additional 3rd packet is available. device hard irq // does nothing because NAPI_STATE_SCHED bit is owned by thread 1 napi_schedule(); napi_complete_done(napi, 2); rearm_irq(); Note that rearm_irq() will not force the device to send an additional IRQ for the packet it already signaled (3rd packet in my example) This patch adds a new NAPI_STATE_MISSED bit, that napi_schedule_prep() can set if it could not grab NAPI_STATE_SCHED Then napi_complete_done() properly reschedules the napi to make sure we do not miss something. Since we manipulate multiple bits at once, use cmpxchg() like in sk_busy_loop() to provide proper transactions. In v2, I changed napi_watchdog() to use a relaxed variant of napi_schedule_prep() : No need to set NAPI_STATE_MISSED from this point. In v3, I added more details in the changelog and clears NAPI_STATE_MISSED in busy_poll_stop() In v4, I added the ideas given by Alexander Duyck in v3 review Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 2月, 2017 2 次提交
-
-
由 Steffen Klassert 提交于
The upcomming IPsec ESP gro callbacks will consume the skb, so prepare for that. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Steffen Klassert 提交于
Add a skb_gro_flush_final helper to prepare for consuming skbs in call_gro_receive. We will extend this helper to not touch the skb if the skb is consumed by a gro callback with a followup patch. We need this to handle the upcomming IPsec ESP callbacks as they reinject the skb to the napi_gro_receive asynchronous. The handler is used in all gro_receive functions that can call the ESP gro handlers. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 14 2月, 2017 1 次提交
-
-
由 Tobias Klauser 提交于
garp_port is only used in net/802/garp.c which is only compiled with CONFIG_GARP enabled. Same goes for mrp_port which is only used in net/802/mrp.c with CONFIG_MRP enabled. Only include the two members in struct net_device if their respective CONFIG_* is enabled. This saves a few bytes in struct net_device in case CONFIG_GARP or CONFIG_MRP are not enabled. Signed-off-by: NTobias Klauser <tklauser@distanz.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 2月, 2017 1 次提交
-
-
由 Willem de Bruijn 提交于
The stack must not pass packets to device drivers that are shorter than the minimum link layer header length. Previously, packet sockets would drop packets smaller than or equal to dev->hard_header_len, but this has false positives. Zero length payload is used over Ethernet. Other link layer protocols support variable length headers. Support for validation of these protocols removed the min length check for all protocols. Introduce an explicit dev->min_header_len parameter and drop all packets below this value. Initially, set it to non-zero only for Ethernet and loopback. Other protocols can follow in a patch to net-next. Fixes: 9ed988cd ("packet: validate variable length ll headers") Reported-by: NSowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Acked-by: NSowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 2月, 2017 1 次提交
-
-
由 Ido Schimmel 提交于
In commit 18bfb924 ("net: introduce default neigh_construct/destroy ndo calls for L2 upper devices") we added these ndos to stacked devices such as team and bond, so that calls will be propagated to mlxsw. However, previous commit removed the reliance on these ndos and no new users of these ndos have appeared since above mentioned commit. We can therefore safely remove this dead code. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 2月, 2017 1 次提交
-
-
由 Eric Dumazet 提交于
All __napi_complete() callers have been converted to use the more standard napi_complete_done(), we can now remove this NAPI method for good. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 2月, 2017 1 次提交
-
-
由 Eric Dumazet 提交于
We added generic support for busy polling in NAPI layer in linux-4.5 No network driver uses ndo_busy_poll() anymore, we can get rid of the pointer in struct net_device_ops, and its use in sk_busy_loop() Saves NETIF_F_BUSY_POLL features bit. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-