- 27 8月, 2016 2 次提交
-
-
由 Ido Schimmel 提交于
switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of port netdevs so that packets being flooded by the device won't be flooded twice. It works by assigning a unique identifier (the ifindex of the first bridge port) to bridge ports sharing the same parent ID. This prevents packets from being flooded twice by the same switch, but will flood packets through bridge ports belonging to a different switch. This method is problematic when stacked devices are taken into account, such as VLANs. In such cases, a physical port netdev can have upper devices being members in two different bridges, thus requiring two different 'offload_fwd_mark's to be configured on the port netdev, which is impossible. The main problem is that packet and netdev marking is performed at the physical netdev level, whereas flooding occurs between bridge ports, which are not necessarily port netdevs. Instead, packet and netdev marking should really be done in the bridge driver with the switch driver only telling it which packets it already forwarded. The bridge driver will mark such packets using the mark assigned to the ingress bridge port and will prevent the packet from being forwarded through any bridge port sharing the same mark (i.e. having the same parent ID). Remove the current switchdev 'offload_fwd_mark' implementation and instead implement the proposed method. In addition, make rocker - the sole user of the mark - use the proposed method. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
Currently in process_backlog(), the process_queue dequeuing is performed with local IRQ disabled, to protect against flush_backlog(), which runs in hard IRQ context. This patch moves the flush operation to a work queue and runs the callback with bottom half disabled to protect the process_queue against dequeuing. Since process_queue is now always manipulated in bottom half context, the irq disable/enable pair around the dequeue operation are removed. To keep the flush time as low as possible, the flush works are scheduled on all online cpu simultaneously, using the high priority work-queue and statically allocated, per cpu, work structs. Overall this change increases the time required to destroy a device to improve slightly the packets reinjection performances. Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 8月, 2016 1 次提交
-
-
由 Sabrina Dubroca 提交于
The idea for type_check in dev_get_nest_level() was to count the number of nested devices of the same type (currently, only macvlan or vlan devices). This prevented the false positive lockdep warning on configurations such as: eth0 <--- macvlan0 <--- vlan0 <--- macvlan1 However, this doesn't prevent a warning on a configuration such as: eth0 <--- macvlan0 <--- vlan0 eth1 <--- vlan1 <--- macvlan1 In this case, all the locks end up with a nesting subclass of 1, so lockdep thinks that there is still a deadlock: - in the first case we have (macvlan_netdev_addr_lock_key, 1) and then take (vlan_netdev_xmit_lock_key, 1) - in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then take (macvlan_netdev_addr_lock_key, 1) By removing the linktype check in dev_get_nest_level() and always incrementing the nesting depth, lockdep considers this configuration valid. Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 8月, 2016 1 次提交
-
-
由 Jiri Kosina 提交于
Convert the per-device linked list into a hashtable. The primary motivation for this change is that currently, we're not tracking all the qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup performed over the linked list by qdisc_match_from_root() is rather expensive. The ultimate goal is to get rid of hidden qdiscs completely, which will bring much more determinism in user experience. Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 7月, 2016 1 次提交
-
-
由 Brenden Blanco 提交于
Add one new netdev op for drivers implementing the BPF_PROG_TYPE_XDP filter. The single op is used for both setup/query of the xdp program, modelled after ndo_setup_tc. Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 7月, 2016 1 次提交
-
-
由 Jesper Dangaard Brouer 提交于
An important information for the napi_poll tracepoint is knowing the work done (packets processed) by the napi_poll() call. Add both the work done and budget, as they are related. Handle trace_napi_poll() param change in dropwatch/drop_monitor and in python perf script netdev-times.py in backward compat way, as python fortunately supports optional parameter handling. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 7月, 2016 1 次提交
-
-
由 Jiri Pirko 提交于
L2 upper device needs to propagate neigh_construct/destroy calls down to lower devices. Do this by defining default ndo functions and use them in team, bond, bridge and vlan. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Reviewed-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 7月, 2016 1 次提交
-
-
由 Jiri Pirko 提交于
Add functions that iterate over lower devices and find port device. As a dependency add netdev_for_each_all_lower_dev and netdev_for_each_all_lower_dev_rcu macro with netdev_all_lower_get_next and netdev_all_lower_get_next_rcu shelpers. Also, add functions to return mlxsw struct according to lower device found and mlxsw_port struct with a reference to lower device. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Reviewed-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 6月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
Qdisc performance suffers when packets are dropped at enqueue() time because drops (kfree_skb()) are done while qdisc lock is held, delaying a dequeue() draining the queue. Nominal throughput can be reduced by 50 % when this happens, at a time we would like the dequeue() to proceed as fast as possible. Even FQ is vulnerable to this problem, while one of FQ goals was to provide some flow isolation. This patch adds a 'struct sk_buff **to_free' parameter to all qdisc->enqueue(), and in qdisc_drop() helper. I measured a performance increase of up to 12 %, but this patch is a prereq so that future batches in enqueue() can fly. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 6月, 2016 2 次提交
-
-
由 Wei Tang 提交于
The space is missing after ',', and this will introduce much more noise when checking patch around. Signed-off-by: NWei Tang <tangwei@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Wei Tang 提交于
This patch fixes the checkpatch.pl error to dev.c: ERROR: do not initialise statics to 0 Signed-off-by: NWei Tang <tangwei@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 6月, 2016 2 次提交
-
-
由 Linus Torvalds 提交于
We always mixed in the parent pointer into the dentry name hash, but we did it late at lookup time. It turns out that we can simplify that lookup-time action by salting the hash with the parent pointer early instead of late. A few other users of our string hashes also wanted to mix in their own pointers into the hash, and those are updated to use the same mechanism. Hash users that don't have any particular initial salt can just use the NULL pointer as a no-salt. Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daniel Borkmann 提交于
Respect the stack's xmit_recursion limit for calls into dev_queue_xmit(). Currently, they are not handeled by the limiter when attached to clsact's egress parent, for example, and a buggy program redirecting it to the same device again could run into stack overflow eventually. It would be good if we could notify an admin to give him a chance to react. We reuse xmit_recursion instead of having one private to eBPF, so that the stack's current recursion depth will be taken into account as well. Follow-up to commit 3896d655 ("bpf: introduce bpf_clone_redirect() helper") and 27b29f63 ("bpf: add bpf_redirect() helper"). Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 6月, 2016 1 次提交
-
-
由 Hariprasad Shenai 提交于
When in kdump kernel, reduce memory usage by only using a single Queue Set for multiqueue devices. So make netif_get_num_default_rss_queues() return one, when in kdump kernel. Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 6月, 2016 2 次提交
-
-
由 Eric Dumazet 提交于
Instead of using a single bit (__QDISC___STATE_RUNNING) in sch->__state, use a seqcount. This adds lockdep support, but more importantly it will allow us to sample qdisc/class statistics without having to grab qdisc root lock. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Note: Tom Herbert posted almost same patch 3 months back, but for different reasons. The reasons we want to get rid of this spin_trylock() are : 1) Under high qdisc pressure, the spin_trylock() has almost no chance to succeed. 2) We loop multiple times in softirq handler, eventually reaching the max retry count (10), and we schedule ksoftirqd. Since we want to adhere more strictly to ksoftirqd being waked up in the future (https://lwn.net/Articles/687617/), better avoid spurious wakeups. 3) calls to __netif_reschedule() dirty the cache line containing q->next_sched, slowing down the owner of qdisc. 4) RT kernels can not use the spin_trylock() here. With help of busylock, we get the qdisc spinlock fast enough, and the trylock trick brings only performance penalty. Depending on qdisc setup, I observed a gain of up to 19 % in qdisc performance (1016600 pps instead of 853400 pps, using prio+tbf+fq_codel) ("mpstat -I SCPU 1" is much happier now) Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Acked-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 5月, 2016 1 次提交
-
-
由 Daniel Borkmann 提交于
Follow-up for 8a3a4c6e ("net: make sch_handle_ingress() drop monitor ready") to also make the egress side drop monitor ready. Also here only TC_ACT_SHOT is a clear indication that something went wrong. Hence don't provide false positives to drop monitors such as 'perf record -e skb:kfree_skb ...'. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 5月, 2016 1 次提交
-
-
由 David Ahern 提交于
Currently the VRF driver uses the rx_handler to switch the skb device to the VRF device. Switching the dev prior to the ip / ipv6 layer means the VRF driver has to duplicate IP/IPv6 processing which adds overhead and makes features such as retaining the ingress device index more complicated than necessary. This patch moves the hook to the L3 layer just after the first NF_HOOK for PRE_ROUTING. This location makes exposing the original ingress device trivial (next patch) and allows adding other NF_HOOKs to the VRF driver in the future. dev_queue_xmit_nit is exported so that the VRF driver can cycle the skb with the switched device through the packet taps to maintain current behavior (tcpdump can be used on either the vrf device or the enslaved devices). Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 5月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
TC_ACT_STOLEN is used when ingress traffic is mirred/redirected to say ifb. Packet is not dropped, but consumed. Only TC_ACT_SHOT is a clear indication something went wrong. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 5月, 2016 1 次提交
-
-
由 Alexander Duyck 提交于
This change makes it so that we will strip the TSO_MANGLEID bit if TSO is not present. This way we will also handle ECN correctly of TSO is not present. Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 5月, 2016 1 次提交
-
-
由 Alexander Duyck 提交于
In the case of the mlx4 and mlx5 driver they do not support IPv6 checksum offload for tunnels. With this being the case we should disable GSO in addition to the checksum offload features when we find that a device cannot perform a checksum on a given packet type. Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 4月, 2016 1 次提交
-
-
由 Nikolay Aleksandrov 提交于
is_skb_forwardable is not supposed to change anything so constify its arguments Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 4月, 2016 1 次提交
-
-
由 Jason Wang 提交于
There's no need to calculate rps hash if it was not enabled. So this patch export rps_needed and check it before trying to get rps hash. Tests (using pktgen to inject packets to guest) shows this can improve pps about 13% (when rps is disabled). Before: ~1150000 pps After: ~1300000 pps Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: NJason Wang <jasowang@redhat.com> ---- Changes from V1: - Fix build when CONFIG_RPS is not set Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 4月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
Rename NET_INC_STATS_BH() to __NET_INC_STATS() and NET_ADD_STATS_BH() to __NET_ADD_STATS() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 4月, 2016 1 次提交
-
-
由 Alexander Duyck 提交于
This patch adds support for NETIF_F_TSO_MANGLEID if a given tunnel supports NETIF_F_TSO. This way if needed a device can then later enable the TSO with IP ID mangling and the tunnels on top of that device can then also make use of the IP ID mangling as well. Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 4月, 2016 4 次提交
-
-
由 Eric Dumazet 提交于
skbs given to validate_xmit_skb() should not have a next pointer anymore. Also if a packet is dropped, increment dev->tx_dropped __dev_queue_xmit() no longer has to change tx_dropped in this case. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This patch adds support for something I am referring to as GSO partial. The basic idea is that we can support a broader range of devices for segmentation if we use fixed outer headers and have the hardware only really deal with segmenting the inner header. The idea behind the naming is due to the fact that everything before csum_start will be fixed headers, and everything after will be the region that is handled by hardware. With the current implementation it allows us to add support for the following GSO types with an inner TSO_MANGLEID or TSO6 offload: NETIF_F_GSO_GRE NETIF_F_GSO_GRE_CSUM NETIF_F_GSO_IPIP NETIF_F_GSO_SIT NETIF_F_UDP_TUNNEL NETIF_F_UDP_TUNNEL_CSUM In the case of hardware that already supports tunneling we may be able to extend this further to support TSO_TCPV4 without TSO_MANGLEID if the hardware can support updating inner IPv4 headers. Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This patch does two things. First it allows TCP to aggregate TCP frames with a fixed IPv4 ID field. As a result we should now be able to aggregate flows that were converted from IPv6 to IPv4. In addition this allows us more flexibility for future implementations of segmentation as we may be able to use a fixed IP ID when segmenting the flow. The second thing this does is that it places limitations on the outer IPv4 ID header in the case of tunneled frames. Specifically it forces the IP ID to be incrementing by 1 unless the DF bit is set in the outer IPv4 header. This way we can avoid creating overlapping series of IP IDs that could possibly be fragmented if the frame goes through GRO and is then resegmented via GSO. Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This patch adds support for TSO using IPv4 headers with a fixed IP ID field. This is meant to allow us to do a lossless GRO in the case of TCP flows that use a fixed IP ID such as those that convert IPv6 header to IPv4 headers. In addition I am adding a feature that for now I am referring to TSO with IP ID mangling. Basically when this flag is enabled the device has the option to either output the flow with incrementing IP IDs or with a fixed IP ID regardless of what the original IP ID ordering was. This is useful in cases where the DF bit is set and we do not care if the original IP ID value is maintained. Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 4月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
After introduction of ndo_features_check(), we believe that very specific checks for rare features should not be done in core networking stack. No driver uses gso_min_segs yet, so we revert this feature and save few instructions per tx packet in fast path. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 4月, 2016 1 次提交
-
-
由 Alexander Duyck 提交于
This patch fixes an issue I found in which we were dropping frames if we had enabled checksums on GRE headers that were encapsulated by either FOU or GUE. Without this patch I was barely able to get 1 Gb/s of throughput. With this patch applied I am now at least getting around 6 Gb/s. The issue is due to the fact that with FOU or GUE applied we do not provide a transport offset pointing to the GRE header, nor do we offload it in software as the GRE header is completely skipped by GSO and treated like a VXLAN or GENEVE type header. As such we need to prevent the stack from generating it and also prevent GRE from generating it via any interface we create. Fixes: c3483384 ("gro: Allow tunnel stacking in the case of FOU/GUE") Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 4月, 2016 1 次提交
-
-
由 Aaron Conole 提交于
When signaling that a GRO frame is ready to be processed, the network stack correctly checks length and aborts processing when a frame is less than 14 bytes. However, such a condition is really indicative of a broken driver, and should be loudly signaled, rather than silently dropped as the case is today. Convert the condition to use net_warn_ratelimited() to ensure the stack loudly complains about such broken drivers. Signed-off-by: NAaron Conole <aconole@bytheb.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 3月, 2016 1 次提交
-
-
由 Luis de Bethencourt 提交于
When the function dev_get_phys_port_name was added it missed a description for it's len argument. Adding it. Fixes: db24a904 ("net: add support for phys_port_name") Signed-off-by: NLuis de Bethencourt <luisbg@osg.samsung.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 3月, 2016 1 次提交
-
-
由 Jesse Gross 提交于
When drivers express support for TSO of encapsulated packets, they only mean that they can do it for one layer of encapsulation. Supporting additional levels would mean updating, at a minimum, more IP length fields and they are unaware of this. No encapsulation device expresses support for handling offloaded encapsulated packets, so we won't generate these types of frames in the transmit path. However, GRO doesn't have a check for multiple levels of encapsulation and will attempt to build them. UDP tunnel GRO actually does prevent this situation but it only handles multiple UDP tunnels stacked on top of each other. This generalizes that solution to prevent any kind of tunnel stacking that would cause problems. Fixes: bf5a755f ("net-gre-gro: Add GRE support to the GRO stack") Signed-off-by: NJesse Gross <jesse@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 2月, 2016 1 次提交
-
-
由 Nikolay Aleksandrov 提交于
When I used netdev_for_each_lower_dev in commit bad53162 ("vrf: remove slave queue and private slave struct") I thought that it acts like netdev_for_each_lower_private and can be used to remove the current device from the list while walking, but unfortunately it acts more like netdev_for_each_lower_private_rcu and doesn't allow it. The difference is where the "iter" points to, right now it points to the current element and that makes it impossible to remove it. Change the logic to be similar to netdev_for_each_lower_private and make it point to the "next" element so we can safely delete the current one. VRF is the only such user right now, there's no change for the read-only users. Here's what can happen now: [98423.249858] general protection fault: 0000 [#1] SMP [98423.250175] Modules linked in: vrf bridge(O) stp llc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace sunrpc crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ppdev aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd evdev serio_raw pcspkr virtio_balloon parport_pc parport i2c_piix4 i2c_core virtio_console acpi_cpufreq button 9pnet_virtio 9p 9pnet fscache ipv6 autofs4 ext4 crc16 mbcache jbd2 sg virtio_blk virtio_net sr_mod cdrom e1000 ata_generic ehci_pci uhci_hcd ehci_hcd usbcore usb_common virtio_pci ata_piix libata floppy virtio_ring virtio scsi_mod [last unloaded: bridge] [98423.255040] CPU: 1 PID: 14173 Comm: ip Tainted: G O 4.5.0-rc2+ #81 [98423.255386] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 [98423.255777] task: ffff8800547f5540 ti: ffff88003428c000 task.ti: ffff88003428c000 [98423.256123] RIP: 0010:[<ffffffff81514f3e>] [<ffffffff81514f3e>] netdev_lower_get_next+0x1e/0x30 [98423.256534] RSP: 0018:ffff88003428f940 EFLAGS: 00010207 [98423.256766] RAX: 0002000100000004 RBX: ffff880054ff9000 RCX: 0000000000000000 [98423.257039] RDX: ffff88003428f8b8 RSI: ffff88003428f950 RDI: ffff880054ff90c0 [98423.257287] RBP: ffff88003428f940 R08: 0000000000000000 R09: 0000000000000000 [98423.257537] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88003428f9e0 [98423.257802] R13: ffff880054a5fd00 R14: ffff88003428f970 R15: 0000000000000001 [98423.258055] FS: 00007f3d76881700(0000) GS:ffff88005d000000(0000) knlGS:0000000000000000 [98423.258418] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [98423.258650] CR2: 00007ffe5951ffa8 CR3: 0000000052077000 CR4: 00000000000406e0 [98423.258902] Stack: [98423.259075] ffff88003428f960 ffffffffa0442636 0002000100000004 ffff880054ff9000 [98423.259647] ffff88003428f9b0 ffffffff81518205 ffff880054ff9000 ffff88003428f978 [98423.260208] ffff88003428f978 ffff88003428f9e0 ffff88003428f9e0 ffff880035b35f00 [98423.260739] Call Trace: [98423.260920] [<ffffffffa0442636>] vrf_dev_uninit+0x76/0xa0 [vrf] [98423.261156] [<ffffffff81518205>] rollback_registered_many+0x205/0x390 [98423.261401] [<ffffffff815183ec>] unregister_netdevice_many+0x1c/0x70 [98423.261641] [<ffffffff8153223c>] rtnl_delete_link+0x3c/0x50 [98423.271557] [<ffffffff815335bb>] rtnl_dellink+0xcb/0x1d0 [98423.271800] [<ffffffff811cd7da>] ? __inc_zone_state+0x4a/0x90 [98423.272049] [<ffffffff815337b4>] rtnetlink_rcv_msg+0x84/0x200 [98423.272279] [<ffffffff810cfe7d>] ? trace_hardirqs_on+0xd/0x10 [98423.272513] [<ffffffff8153370b>] ? rtnetlink_rcv+0x1b/0x40 [98423.272755] [<ffffffff81533730>] ? rtnetlink_rcv+0x40/0x40 [98423.272983] [<ffffffff8155d6e7>] netlink_rcv_skb+0x97/0xb0 [98423.273209] [<ffffffff8153371a>] rtnetlink_rcv+0x2a/0x40 [98423.273476] [<ffffffff8155ce8b>] netlink_unicast+0x11b/0x1a0 [98423.273710] [<ffffffff8155d2f1>] netlink_sendmsg+0x3e1/0x610 [98423.273947] [<ffffffff814fbc98>] sock_sendmsg+0x38/0x70 [98423.274175] [<ffffffff814fc253>] ___sys_sendmsg+0x2e3/0x2f0 [98423.274416] [<ffffffff810d841e>] ? do_raw_spin_unlock+0xbe/0x140 [98423.274658] [<ffffffff811e1bec>] ? handle_mm_fault+0x26c/0x2210 [98423.274894] [<ffffffff811e19cd>] ? handle_mm_fault+0x4d/0x2210 [98423.275130] [<ffffffff81269611>] ? __fget_light+0x91/0xb0 [98423.275365] [<ffffffff814fcd42>] __sys_sendmsg+0x42/0x80 [98423.275595] [<ffffffff814fcd92>] SyS_sendmsg+0x12/0x20 [98423.275827] [<ffffffff81611bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a [98423.276073] Code: c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 06 55 48 81 c7 c0 00 00 00 48 89 e5 48 8b 00 48 39 f8 74 09 48 89 06 <48> 8b 40 e8 5d c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 [98423.279639] RIP [<ffffffff81514f3e>] netdev_lower_get_next+0x1e/0x30 [98423.279920] RSP <ffff88003428f940> CC: David Ahern <dsa@cumulusnetworks.com> CC: David S. Miller <davem@davemloft.net> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Vlad Yasevich <vyasevic@redhat.com> Fixes: bad53162 ("vrf: remove slave queue and private slave struct") Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: NDavid Ahern <dsa@cumulusnetworks.com> Tested-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 2月, 2016 1 次提交
-
-
由 Phil Sutter 提交于
My implementation around IFF_NO_QUEUE driver flag assumed that leaving tx_queue_len untouched (specifically: not setting it to zero) by drivers would make it possible to assign a regular qdisc to them without having to worry about setting tx_queue_len to a useful value. This was only partially true: I overlooked that some drivers don't call ether_setup() and therefore not initialize tx_queue_len to the default value of 1000. Consequently, removing the workarounds in place for that case in qdisc implementations which cared about it (namely, pfifo, bfifo, gred, htb, plug and sfb) leads to problems with these specific interface types and qdiscs. Luckily, there's already a sanitization point for drivers setting tx_queue_len to zero, which can be reused to assign the fallback value most qdisc implementations used, which is 1. Fixes: 348e3435 ("net: sched: drop all special handling of tx_queue_len == 0") Tested-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: NPhil Sutter <phil@nwl.cc> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 2月, 2016 2 次提交
-
-
由 Jesper Dangaard Brouer 提交于
The network stack defers SKBs free, in-case free happens in IRQ or when IRQs are disabled. This happens in __dev_kfree_skb_irq() that writes SKBs that were free'ed during IRQ to the softirq completion queue (softnet_data.completion_queue). These SKBs are naturally delayed, and cleaned up during NET_TX_SOFTIRQ in function net_tx_action(). Take advantage of this a use the skb defer and flush API, as we are already in softirq context. For modern drivers this rarely happens. Although most drivers do call dev_kfree_skb_any(), which detects the situation and calls __dev_kfree_skb_irq() when needed. This due to netpoll can call from IRQ context. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Dangaard Brouer 提交于
Discovered that network stack were hitting the kmem_cache/SLUB slowpath when freeing SKBs. Doing bulk free with kmem_cache_free_bulk can speedup this slowpath. NAPI context is a bit special, lets take advantage of that for bulk free'ing SKBs. In NAPI context we are running in softirq, which gives us certain protection. A softirq can run on several CPUs at once. BUT the important part is a softirq will never preempt another softirq running on the same CPU. This gives us the opportunity to access per-cpu variables in softirq context. Extend napi_alloc_cache (before only contained page_frag_cache) to be a struct with a small array based stack for holding SKBs. Introduce a SKB defer and flush API for accessing this. Introduce napi_consume_skb() as replacement for e.g. dev_consume_skb_any() when running in NAPI context. A small trick to handle/detect if we are called from netpoll is to see if budget is 0. In that case, we need to invoke dev_consume_skb_irq(). Joint work with Alexander Duyck. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 2月, 2016 2 次提交
-
-
由 Jarod Wilson 提交于
This adds an rx_nohandler stat counter, along with a sysfs statistics node, and copies the counter out via netlink as well. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@mellanox.com> CC: Daniel Borkmann <daniel@iogearbox.net> CC: Tom Herbert <tom@herbertland.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> CC: netdev@vger.kernel.org Signed-off-by: NJarod Wilson <jarod@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jarod Wilson 提交于
The netdev_stats_to_stats64 function copies the deprecated net_device_stats format stats into rtnl_link_stats64 for legacy support purposes, but with the BUILD_BUG_ON as it was, it wasn't possible to extend rtnl_link_stats64 without also extending net_device_stats. Relax the BUILD_BUG_ON to only require that rtnl_link_stats64 is larger, and zero out all the stat counters that aren't present in net_device_stats. CC: Eric Dumazet <edumazet@google.com> CC: netdev@vger.kernel.org Signed-off-by: NJarod Wilson <jarod@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-