- 14 1月, 2017 13 次提交
-
-
由 Yuchung Cheng 提交于
This patch disables FACK by default as RACK is the successor of FACK (inspired by the insights behind FACK). FACK[1] in Linux works as follows: a packet P is deemed lost, if packet Q of higher sequence is s/acked and P and Q are distant by at least dupthresh number of packets in sequence space. FACK is more aggressive than the IETF recommened recovery for SACK (RFC3517 A Conservative Selective Acknowledgment (SACK)-based Loss Recovery Algorithm for TCP), because a single SACK may trigger fast recovery. This obviously won't work well with reordering so FACK is dynamically disabled upon detecting reordering. RACK supersedes FACK by using time distance instead of sequence distance. On reordering, RACK waits for a quarter of RTT receiving a single SACK before starting recovery. (the timer can be made more adaptive in the future by measuring reordering distance in time, but currently RTT/4 seem to work well.) Once the recovery starts, RACK behaves almost like FACK because it reduces the reodering window to 1ms, so it fast retransmits quickly. In addition RACK can detect loss retransmission as it does not care about the packet sequences (being repeated or not), which is extremely useful when the connection is going through a traffic policer. Google server experiments indicate that disabling FACK after enabling RACK has negligible impact on the overall loss recovery performance with more reordering events detected. But we still keep the FACK implementation for backup if RACK has bugs that needs to be disabled. [1] M. Mathis, J. Mahdavi, "Forward Acknowledgment: Refining TCP Congestion Control," In Proceedings of SIGCOMM '96, August 1996. Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
Thin stream DUPACK is to start fast recovery on only one DUPACK provided the connection is a thin stream (i.e., low inflight). But this older feature is now subsumed with RACK. If a connection receives only a single DUPACK, RACK would arm a reordering timer and soon starts fast recovery instead of timeout if no further ACKs are received. The socket option (THIN_DUPACK) is kept as a nop for compatibility. Note that this patch does not change another thin-stream feature which enables linear RTO. Although it might be good to generalize that in the future (i.e., linear RTO for the first say 3 retries). Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
This patch removes the (partial) implementation of the aggressive limited transmit in RFC4653 TCP Non-Congestion Robustness (NCR). NCR is a mitigation to the problem created by the dynamic DUPACK threshold. With the current adaptive DUPACK threshold (tp->reordering) could cause timeouts by preventing fast recovery. For example, if the last packet of a cwnd burst was reordered, the threshold will be set to the size of cwnd. But if next application burst is smaller than threshold and has drops instead of reorderings, the sender would not trigger fast recovery but instead resorts to a timeout recovery. NCR mitigates this issue by checking the number of DUPACKs against the current flight size additionally. The techniqueue is similar to the early retransmit RFC. With RACK loss detection, this mitigation is not needed, because RACK does not use DUPACK threshold to detect losses. RACK arms a reordering timer to fire at most a quarter RTT later to start fast recovery. Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
This patch removes the support of RFC5827 early retransmit (i.e., fast recovery on small inflight with <3 dupacks) because it is subsumed by the new RACK loss detection. More specifically when RACK receives DUPACKs, it'll arm a reordering timer to start fast recovery after a quarter of (min)RTT, hence it covers the early retransmit except RACK does not limit itself to specific inflight or dupack numbers. Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
Forward retransmit is an esoteric feature in RFC3517 (condition(3) in the NextSeg()). Basically if a packet is not considered lost by the current criteria (# of dupacks etc), but the congestion window has room for more packets, then retransmit this packet. However it actually conflicts with the rest of recovery design. For example, when reordering is detected we want to be conservative in retransmitting packets but forward-retransmit feature would break that to force more retransmission. Also the implementation is fairly complicated inside the retransmission logic inducing extra iterations in the write queue. With RACK losses are being detected timely and this heuristic is no longer necessary. There this patch removes the feature. Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
Current F-RTO reverts cwnd reset whenever a never-retransmitted packet was (s)acked. The timeout can be declared spurious because the packets acknoledged with this ACK was transmitted before the timeout, so clearly not all the packets are lost to reset the cwnd. This nice detection does not really depend F-RTO internals. This patch applies the detection universally. On Google servers this change detected 20% more spurious timeouts. Suggested-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
This patch changes two things: 1. Start fast recovery with RACK in addition to other heuristics (e.g., DUPACK threshold, FACK). Prior to this change RACK is enabled to detect losses only after the recovery has started by other algorithms. 2. Disable TCP early retransmit. RACK subsumes the early retransmit with the new reordering timer feature. A latter patch in this series removes the early retransmit code. Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
Currently RACK would mark loss before the undo operations in TCP loss recovery. This could incorrectly identify real losses as spurious. For example a sender first experiences a delay spike and then eventually some packets were lost due to buffer overrun. In this case, the sender should perform fast recovery b/c not all the packets were lost. But the sender may first trigger a (spurious) RTO and reset cwnd to 1. The following ACKs may used to mark real losses by tcp_rack_mark_lost. Then in tcp_process_loss this ACK could trigger F-RTO undo condition and unmark real losses and revert the cwnd reduction. If there are no more ACKs coming back, eventually the sender would timeout again instead of performing fast recovery. The patch fixes this incorrect process by always performing the undo checks before detecting losses. Fixes: 4f41b1c5 ("tcp: use RACK to detect losses") Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
The packets inside a jumbo skb (e.g., TSO) share the same skb timestamp, even though they are sent sequentially on the wire. Since RACK is based on time, it can not detect some packets inside the same skb are lost. However, we can leverage the packet sequence numbers as extended timestamps to detect losses. Therefore, when RACK timestamp is identical to skb's timestamp (i.e., one of the packets of the skb is acked or sacked), we use the sequence numbers of the acked and unacked packets to break ties. We can use the same sequence logic to advance RACK xmit time as well to detect more losses and avoid timeout. Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
This patch makes RACK install a reordering timer when it suspects some packets might be lost, but wants to delay the decision a little bit to accomodate reordering. It does not create a new timer but instead repurposes the existing RTO timer, because both are meant to retransmit packets. Specifically it arms a timer ICSK_TIME_REO_TIMEOUT when the RACK timing check fails. The wait time is set to RACK.RTT + RACK.reo_wnd - (NOW - Packet.xmit_time) + fudge This translates to expecting a packet (Packet) should take (RACK.RTT + RACK.reo_wnd + fudge) to deliver after it was sent. When there are multiple packets that need a timer, we use one timer with the maximum timeout. Therefore the timer conservatively uses the maximum window to expire N packets by one timeout, instead of N timeouts to expire N packets sent at different times. The fudge factor is 2 jiffies to ensure when the timer fires, all the suspected packets would exceed the deadline and be marked lost by tcp_rack_detect_loss(). It has to be at least 1 jiffy because the clock may tick between calling icsk_reset_xmit_timer(timeout) and actually hang the timer. The next jiffy is to lower-bound the timeout to 2 jiffies when reo_wnd is < 1ms. When the reordering timer fires (tcp_rack_reo_timeout): If we aren't in Recovery we'll enter fast recovery and force fast retransmit. This is very similar to the early retransmit (RFC5827) except RACK is not constrained to only enter recovery for small outstanding flights. Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
Record the most recent RTT in RACK. It is often identical to the "ca_rtt_us" values in tcp_clean_rtx_queue. But when the packet has been retransmitted, RACK choses to believe the ACK is for the (latest) retransmitted packet if the RTT is over minimum RTT. This requires passing the arrival time of the most recent ACK to RACK routines. The timestamp is now recorded in the "ack_time" in tcp_sacktag_state during the ACK processing. This patch does not change the RACK algorithm itself. It only adds the RTT variable to prepare the next main patch. Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
Create a new helper tcp_rack_detect_loss to prepare the upcoming RACK reordering timer patch. Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
Create a new helper tcp_rack_mark_skb_lost to prepare the upcoming RACK reordering timer support. Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 1月, 2017 3 次提交
-
-
由 Eric Dumazet 提交于
Current allocations are not NUMA aware, and lack proper cleanup in case of error. It is perfectly fine to use static per cpu allocations for 256 bytes per cpu. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: David Lebrun <david.lebrun@uclouvain.be> Acked-by: NDavid Lebrun <david.lebrun@uclouvain.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
Recently we started using ipmr with thousands of entries and easily hit soft lockups on smaller devices. The reason is that the hash function uses the high order bits from the src and dst, but those don't change in many common cases, also the hash table is only 64 elements so with thousands it doesn't scale at all. This patch migrates the hash table to rhashtable, and in particular the rhl interface which allows for duplicate elements to be chained because of the MFC_PROXY support (*,G; *,*,oif cases) which allows for multiple duplicate entries to be added with different interfaces (IMO wrong, but it's been in for a long time). And here are some results from tests I've run in a VM: mr_table size (default, allocated for all namespaces): Before After 49304 bytes 2400 bytes Add 65000 routes (the diff is much larger on smaller devices): Before After 1m42s 58s Forwarding 256 byte packets with 65000 routes (test done in a VM): Before After 3 Mbps / ~1465 pps 122 Mbps / ~59000 pps As a bonus we no longer see the soft lockups on smaller devices which showed up even with 2000 entries before. Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Fixes following warnings : net/core/secure_seq.c:125:28: warning: incorrect type in argument 1 (different base types) net/core/secure_seq.c:125:28: expected unsigned int const [unsigned] [usertype] a net/core/secure_seq.c:125:28: got restricted __be32 [usertype] saddr net/core/secure_seq.c:125:35: warning: incorrect type in argument 2 (different base types) net/core/secure_seq.c:125:35: expected unsigned int const [unsigned] [usertype] b net/core/secure_seq.c:125:35: got restricted __be32 [usertype] daddr net/core/secure_seq.c:125:43: warning: cast from restricted __be16 net/core/secure_seq.c:125:61: warning: restricted __be16 degrades to integer Fixes: 7cd23e53 ("secure_seq: use SipHash in place of MD5") Signed-off-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NJason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 1月, 2017 8 次提交
-
-
由 Wei Yongjun 提交于
Fixes the following sparse warning: net/core/lwt_bpf.c:355:5: warning: symbol 'bpf_lwt_prog_cmp' was not declared. Should it be static? Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
When structs are used to store temporary state in cb[] buffer that is used with programs and among tail calls, then the generated code will not always access the buffer in bpf_w chunks. We can ease programming of it and let this act more natural by allowing for aligned b/h/w/dw sized access for cb[] ctx member. Various test cases are attached as well for the selftest suite. Potentially, this can also be reused for other program types to pass data around. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Currently, when calling convert_ctx_access() callback for the various program types, we pass in insn->dst_reg, insn->src_reg, insn->off from the original instruction. This information is needed to rewrite the instruction that is based on the user ctx structure into a kernel representation for the ctx. As we'd like to allow access size beyond just BPF_W, we'd need also insn->code for that in order to decode the original access size. Given that, lets just pass insn directly to the convert_ctx_access() callback and work on that to not clutter the callback with even more arguments we need to pass when everything is already contained in insn. So lets go through that once, no functional change. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ursula Braun 提交于
When creating an SMC connection, there is a CLC (connection layer control) handshake to prepare for RDMA traffic. The corresponding code is part of commit 0cfdd8f9 ("smc: connection and link group creation"). Mac addresses to be exchanged in the handshake are copied with a wrong length of 12 instead of 6 bytes. Following code overwrites the wrongly copied code, but nevertheless the correct length should already be used for the preceding mac address copying. Use ETH_ALEN for the memcpy length with mac addresses. Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com> Fixes: 0cfdd8f9 ("smc: connection and link group creation") Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ursula Braun 提交于
When introducing the new socket family AF_SMC in commit ac713874 ("smc: establish new socket family"), a typo in af_family_clock_key_strings has slipped in. This patch repairs it. Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com> Fixes: ac713874 ("smc: establish new socket family") Reported-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Fainelli 提交于
netif_wake_subqueue() is duplicating the same thing that netif_tx_wake_queue() does, so make it call it directly after looking up the queue from the index. Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Simon Horman 提交于
Support matching on ARP operation, and hardware and protocol addresses for Ethernet hardware and IPv4 protocol addresses. Example usage: tc qdisc add dev eth0 ingress tc filter add dev eth0 protocol arp parent ffff: flower indev eth0 \ arp_op request arp_sip 10.0.0.1 action drop tc filter add dev eth0 protocol rarp parent ffff: flower indev eth0 \ arp_op reply arp_tha 52:54:3f:00:00:00/24 action drop Signed-off-by: NSimon Horman <simon.horman@netronome.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Simon Horman 提交于
Allow dissection of (R)ARP operation hardware and protocol addresses for Ethernet hardware and IPv4 protocol addresses. There are currently no users of FLOW_DISSECTOR_KEY_ARP. A follow-up patch will allow FLOW_DISSECTOR_KEY_ARP to be used by the flower classifier. Signed-off-by: NSimon Horman <simon.horman@netronome.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 1月, 2017 16 次提交
-
-
由 Colin Ian King 提交于
Trivial fix to spelling mistake in WARN_ONCE message Signed-off-by: NColin Ian King <colin.king@canonical.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
fib_select_path does not call fib_select_multipath if oif is set in the flow struct. For VRF use cases oif is always set, so multipath route selection is bypassed. Use the FLOWI_FLAG_SKIP_NH_OIF to skip the oif check similar to what is done in fib_table_lookup. Add saddr and proto to the flow struct for the fib lookup done by the VRF driver to better match hash computation for a flow. Fixes: 613d09b3 ("net: Use VRF device index for lookups on TX") Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Fainelli 提交于
This reverts commit 3a543ef4 ("net: dsa: Implement ndo_get_phys_port_id") since it misuses the purpose of ndo_get_phys_port_id(). We have ndo_get_phys_port_name() to do the correct thing for us now. Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Fainelli 提交于
Return the physical port number of a DSA created network device using ndo_get_phys_port_name(). Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Tested-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arnd Bergmann 提交于
We now 'select SOCK_CGROUP_DATA' but Kconfig complains that this is not right when CONFIG_NET is disabled and there is no socket interface: warning: (CGROUP_BPF) selects SOCK_CGROUP_DATA which has unmet direct dependencies (NET) I don't know what the correct solution for this is, but simply removing the dependency on NET from SOCK_CGROUP_DATA by moving it out of the 'if NET' section avoids the warning and does not produce other build errors. Fixes: 483c4933 ("cgroup: Fix CGROUP_BPF config") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vivien Didelot 提交于
In the new DTS bindings for DSA (dsa2), the "ethernet" and "link" phandles are respectively mandatory and exclusive to CPU port and DSA link device tree nodes. Simplify dsa2.c a bit by checking the presence of such phandle instead of checking the redundant "label" property. Then the Linux philosophy for Ethernet switch ports is to expose them to userspace as standard NICs by default. Thus use the standard enumerated "eth%d" device name if no "label" property is provided for a user port. This allows to save DTS files from subjective net device names. If one wants to rename an interface, udev rules can be used as usual. Of course the current behavior is unchanged, and the optional "label" property for user ports has precedence over the enumerated name. Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: NUwe Kleine-König <uwe@kleine-koenig.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
On 32bit arches, (skb->end - skb->data) is not 'unsigned int', so we shall use min_t() instead of min() to avoid a compiler error. Fixes: 1272ce87 ("gro: Enter slow-path if there is no tailroom") Reported-by: Nkernel test robot <fengguang.wu@intel.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
Patch series "Page fragment updates", v4. This patch series takes care of a few cleanups for the page fragments API. First we do some renames so that things are much more consistent. First we move the page_frag_ portion of the name to the front of the functions names. Secondly we split out the cache specific functions from the other page fragment functions by adding the word "cache" to the name. Finally I added a bit of documentation that will hopefully help to explain some of this. I plan to revisit this later as we get things more ironed out in the near future with the changes planned for the DMA setup to support eXpress Data Path. This patch (of 3): This patch renames the page frag functions to be more consistent with other APIs. Specifically we place the name page_frag first in the name and then have either an alloc or free call name that we append as the suffix. This makes it a bit clearer in terms of naming. In addition we drop the leading double underscores since we are technically no longer a backing interface and instead the front end that is called from the networking APIs. Link: http://lkml.kernel.org/r/20170104023854.13451.67390.stgit@localhost.localdomainSigned-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Herbert Xu 提交于
The GRO fast path caches the frag0 address. This address becomes invalid if frag0 is modified by pskb_may_pull or its variants. So whenever that happens we must disable the frag0 optimization. This is usually done through the combination of gro_header_hard and gro_header_slow, however, the IPv6 extension header path did the pulling directly and would continue to use the GRO fast path incorrectly. This patch fixes it by disabling the fast path when we enter the IPv6 extension header path. Fixes: 78a478d0 ("gro: Inline skb_gro_header and cache frag0 virtual address") Reported-by: NSlava Shwartsman <slavash@mellanox.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
The GRO path has a fast-path where we avoid calling pskb_may_pull and pskb_expand by directly accessing frag0. However, this should only be done if we have enough tailroom in the skb as otherwise we'll have to expand it later anyway. This patch adds the check by capping frag0_len with the skb tailroom. Fixes: cb18978c ("gro: Open-code final pskb_may_pull") Reported-by: NSlava Shwartsman <slavash@mellanox.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Julian Wiedmann 提交于
With commit e5374399 ("af_iucv: use paged SKBs for big outbound messages"), we transmit paged skbs for both of AF_IUCV's transport modes (IUCV or HiperSockets). The qeth driver for Layer 3 HiperSockets currently doesn't support NETIF_F_SG, so these skbs would just be linearized again by the stack. Avoid that overhead by using paged skbs only for IUCV transport. cc stable, since this also circumvents a significant skb leak when sending large messages (where the skb then needs to be linearized). Signed-off-by: NJulian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> # v4.8+ Fixes: e5374399 ("af_iucv: use paged SKBs for big outbound messages") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sowmini Varadhan 提交于
Commit 7f953ab2 ("af_packet: TX_RING support for TPACKET_V3") now makes it possible to use TX_RING with TPACKET_V3, so make the the relevant information available via 'ss -e -a --packet' Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Anna, Suman 提交于
Commit bdabad3e ("net: Add Qualcomm IPC router") introduced a new address family. Update the family name tables accordingly so that the lockdep initialization can use the proper names for this family. Cc: Courtney Cavin <courtney.cavin@sonymobile.com> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: NSuman Anna <s-anna@ti.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Boyd 提交于
Failure to mark this pointer as __le32 causes checkers like sparse to complain: net/qrtr/qrtr.c:274:16: warning: incorrect type in assignment (different base types) net/qrtr/qrtr.c:274:16: expected unsigned int [unsigned] [usertype] <noident> net/qrtr/qrtr.c:274:16: got restricted __le32 [usertype] <noident> net/qrtr/qrtr.c:275:16: warning: incorrect type in assignment (different base types) net/qrtr/qrtr.c:275:16: expected unsigned int [unsigned] [usertype] <noident> net/qrtr/qrtr.c:275:16: got restricted __le32 [usertype] <noident> net/qrtr/qrtr.c:276:16: warning: incorrect type in assignment (different base types) net/qrtr/qrtr.c:276:16: expected unsigned int [unsigned] [usertype] <noident> net/qrtr/qrtr.c:276:16: got restricted __le32 [usertype] <noident> Silence it. Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: NStephen Boyd <sboyd@codeaurora.org> Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Fainelli 提交于
It is perfectly possible to have non zero indexed switches being present in a DSA switch tree, in such a case, we will be deferencing a NULL pointer while dsa_cpu_port_ethtool_{setup,restore}. Be more defensive and ensure that dst->ds[0] is valid before doing anything with it. Fixes: 0c73c523 ("net: dsa: Initialize CPU port ethtool ops per tree") Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Removes following sparse complain : net/core/flow_dissector.c:70:8: warning: symbol 'skb_flow_get_be16' was not declared. Should it be static? Fixes: 972d3876 ("flow dissector: ICMP support") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-