- 03 12月, 2016 1 次提交
-
-
由 Florian Westphal 提交于
jiffies based timestamps allow for easy inference of number of devices behind NAT translators and also makes tracking of hosts simpler. commit ceaa1fef ("tcp: adding a per-socket timestamp offset") added the main infrastructure that is needed for per-connection ts randomization, in particular writing/reading the on-wire tcp header format takes the offset into account so rest of stack can use normal tcp_time_stamp (jiffies). So only two items are left: - add a tsoffset for request sockets - extend the tcp isn generator to also return another 32bit number in addition to the ISN. Re-use of ISN generator also means timestamps are still monotonically increasing for same connection quadruple, i.e. PAWS will still work. Includes fixes from Eric Dumazet. Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NEric Dumazet <edumazet@google.com> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 12月, 2016 5 次提交
-
-
由 Soheil Hassas Yeganeh 提交于
Only when ICMP packets are enqueued onto the error queue, sk_err is also set. Before f5f99309 (sock: do not set sk_err in sock_dequeue_err_skb), a subsequent error queue read would set sk_err to the next error on the queue, or 0 if empty. As no error types other than ICMP set this field, sk_err should not be modified upon dequeuing them. Only for ICMP errors, reset the (racy) sk_err. Some applications, like traceroute, rely on it and go into a futile busy POLLERR loop otherwise. In principle, sk_err has to be set while an ICMP error is queued. Testing is_icmp_err_skb(skb_next) approximates this without requiring a full queue walk. Applications that receive both ICMP and other errors cannot rely on this legacy behavior, as other errors do not set sk_err in the first place. Fixes: f5f99309 (sock: do not set sk_err in sock_dequeue_err_skb) Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Acked-by: NMaciej Żenczykowski <maze@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Thomas Graf 提交于
Registers new BPF program types which correspond to the LWT hooks: - BPF_PROG_TYPE_LWT_IN => dst_input() - BPF_PROG_TYPE_LWT_OUT => dst_output() - BPF_PROG_TYPE_LWT_XMIT => lwtunnel_xmit() The separate program types are required to differentiate between the capabilities each LWT hook allows: * Programs attached to dst_input() or dst_output() are restricted and may only read the data of an skb. This prevent modification and possible invalidation of already validated packet headers on receive and the construction of illegal headers while the IP headers are still being assembled. * Programs attached to lwtunnel_xmit() are allowed to modify packet content as well as prepending an L2 header via a newly introduced helper bpf_skb_change_head(). This is safe as lwtunnel_xmit() is invoked after the IP header has been assembled completely. All BPF programs receive an skb with L3 headers attached and may return one of the following error codes: BPF_OK - Continue routing as per nexthop BPF_DROP - Drop skb and return EPERM BPF_REDIRECT - Redirect skb to device as per redirect() helper. (Only valid in lwtunnel_xmit() context) The return codes are binary compatible with their TC_ACT_ relatives to ease compatibility. Signed-off-by: NThomas Graf <tgraf@suug.ch> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Thomas Graf 提交于
A route on the output path hitting a RTN_LOCAL route will keep the dst associated on its way through the loopback device. On the receive path, the dst_input() call will thus invoke the input handler of the route created in the output path. Thus, lwt redirection for input must be done for dsts allocated in the otuput path as well. Also, if a route is cached in the input path, the allocated dst should respect lwtunnel configuration on the nexthop as well. Signed-off-by: NThomas Graf <tgraf@suug.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Thomas Graf 提交于
orig_output for IPv4 was only set for dsts which hit an input route. Set it consistently for locally generated traffic as well to allow lwt to continue the dst_output() path as configured by the nexthop. Fixes: 25368623 ("lwt: Add support to redirect dst.input") Signed-off-by: NThomas Graf <tgraf@suug.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Zhang Shengju 提交于
Before this patch, function ndo_dflt_fdb_dump() will always return code from uc fdb dump. The reture code of mc fdb dump is lost. Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 12月, 2016 2 次提交
-
-
由 Lorenzo Colitti 提交于
Commit e2d118a1 ("net: inet: Support UID-based routing in IP protocols.") made __build_flow_key call sock_net(sk) to determine the network namespace of the passed-in socket. This crashes if sk is NULL. Fix this by getting the network namespace from the skb instead. Fixes: e2d118a1 ("net: inet: Support UID-based routing in IP protocols.") Reported-by: NErez Shitrit <erezsh@dev.mellanox.co.il> Signed-off-by: NLorenzo Colitti <lorenzo@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Zhang Shengju 提交于
Currently loop index 'idx' is used as the index in the neigh list of interest. It's increased only when the neigh is dumped. It's not the absolute index in the list. Because there is no info to record which neigh has already be scanned by previous loop. This will cause the filtered out neighs to be scanned mulitple times. This patch make idx as the absolute index in the list, it will increase no matter whether the neigh is filtered. This will prevent the above problem. And this is in line with other dump functions. v2: - take David Ahern's advice to do simple change Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 11月, 2016 7 次提交
-
-
由 Daniel Borkmann 提交于
Add an IFLA_XDP_FLAGS attribute that can be passed for setting up XDP along with IFLA_XDP_FD, which eventually allows user space to implement typical add/replace/delete logic for programs. Right now, calling into dev_change_xdp_fd() will always replace previous programs. When passed XDP_FLAGS_UPDATE_IF_NOEXIST, we can handle this more graceful when requested by returning -EBUSY in case we try to attach a new program, but we find that another one is already attached. This will be used by upcoming front-end for iproute2 as well. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Francis Yan 提交于
This patch exports the sender chronograph stats via the socket SO_TIMESTAMPING channel. Currently we can instrument how long a particular application unit of data was queued in TCP by tracking SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having these sender chronograph stats exported simultaneously along with these timestamps allow further breaking down the various sender limitation. For example, a video server can tell if a particular chunk of video on a connection takes a long time to deliver because TCP was experiencing small receive window. It is not possible to tell before this patch without packet traces. To prepare these stats, the user needs to set SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags while requesting other SOF_TIMESTAMPING TX timestamps. When the timestamps are available in the error queue, the stats are returned in a separate control message of type SCM_TIMESTAMPING_OPT_STATS, in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME, TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond. Signed-off-by: NFrancis Yan <francisyyan@gmail.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Francis Yan 提交于
This patch exports all the sender chronograph measurements collected in the previous patches to TCP_INFO interface. Note that busy time exported includes all the other sending limits (rwnd-limited, sndbuf-limited). Internally the time unit is jiffy but externally the measurements are in microseconds for future extensions. Signed-off-by: NFrancis Yan <francisyyan@gmail.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Francis Yan 提交于
This patch measures the amount of time when TCP runs out of new data to send to the network due to insufficient send buffer, while TCP is still busy delivering (i.e. write queue is not empty). The goal is to indicate either the send buffer autotuning or user SO_SNDBUF setting has resulted network under-utilization. The measurement starts conservatively by checking various conditions to minimize false claims (i.e. under-estimation is more likely). The measurement stops when the SOCK_NOSPACE flag is cleared. But it does not account the time elapsed till the next application write. Also the measurement only starts if the sender is still busy sending data, s.t. the limit accounted is part of the total busy time. Signed-off-by: NFrancis Yan <francisyyan@gmail.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Francis Yan 提交于
This patch measures the total time when the TCP stops sending because the receiver's advertised window is not large enough. Note that once the limit is lifted we are likely in the busy status if we have data pending. Signed-off-by: NFrancis Yan <francisyyan@gmail.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Francis Yan 提交于
This patch measures TCP busy time, which is defined as the period of time when sender has data (or FIN) to send. The time starts when data is buffered and stops when the write queue is flushed by ACKs or error events. Note the busy time does not include SYN time, unless data is included in SYN (i.e. Fast Open). It does include FIN time even if the FIN carries no payload. Excluding pure FIN is possible but would incur one additional test in the fast path, which may not be worth it. Signed-off-by: NFrancis Yan <francisyyan@gmail.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Francis Yan 提交于
This patch implements the skeleton of the TCP chronograph instrumentation on sender side limits: 1) idle (unspec) 2) busy sending data other than 3-4 below 3) rwnd-limited 4) sndbuf-limited The limits are enumerated 'tcp_chrono'. Since a connection in theory can idle forever, we do not track the actual length of this uninteresting idle period. For the rest we track how long the sender spends in each limit. At any point during the life time of a connection, the sender must be in one of the four states. If there are multiple conditions worthy of tracking in a chronograph then the highest priority enum takes precedence over the other conditions. So that if something "more interesting" starts happening, stop the previous chrono and start a new one. The time unit is jiffy(u32) in order to save space in tcp_sock. This implies application must sample the stats no longer than every 49 days of 1ms jiffy. Signed-off-by: NFrancis Yan <francisyyan@gmail.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 11月, 2016 2 次提交
-
-
由 Daniel Borkmann 提交于
Commit dcf80034 ("net/sched: act_mirred: Refactor detection whether dev needs xmit at mac header") added dev_is_mac_header_xmit(); since it's also useful elsewhere, move it to if_arp.h and reuse it for BPF. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
After setup we don't need to keep user space fd number around anymore, as it also has no useful meaning for anyone, just remove it. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 11月, 2016 8 次提交
-
-
由 Jon Paul Maloy 提交于
In commit 10724cc7 ("tipc: redesign connection-level flow control") we replaced the previous message based flow control with one based on 1k blocks. In order to ensure backwards compatibility the mechanism falls back to using message as base unit when it senses that the peer doesn't support the new algorithm. The default flow control window, i.e., how many units can be sent before the sender blocks and waits for an acknowledge (aka advertisement) is 512. This was tested against the previous version, which uses an acknowledge frequency of on ack per 256 received message, and found to work fine. However, we missed the fact that versions older than Linux 3.15 use an acknowledge frequency of 512, which is exactly the limit where a 4.6+ sender will stop and wait for acknowledge. This would also work fine if it weren't for the fact that if the first sent message on a 4.6+ server side is an empty SYNACK, this one is also is counted as a sent message, while it is not counted as a received message on a legacy 3.15-receiver. This leads to the sender always being one step ahead of the receiver, a scenario causing the sender to block after 512 sent messages, while the receiver only has registered 511 read messages. Hence, the legacy receiver is not trigged to send an acknowledge, with a permanently blocked sender as result. We solve this deadlock by simply allowing the sender to send one more message before it blocks, i.e., by a making minimal change to the condition used for determining connection congestion. Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Miroslav Lichvar 提交于
The ETHTOOL_GLINKSETTINGS command is deprecating the ETHTOOL_GSET command and likewise it shouldn't require the CAP_NET_ADMIN capability. Signed-off-by: NMiroslav Lichvar <mlichvar@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
In commit 35c55c98 ("tipc: add neighbor monitoring framework") we added a data area to the link monitor STATE messages under the assumption that previous versions did not use any such data area. For versions older than Linux 4.3 this assumption is not correct. In those version, all STATE messages sent out from a node inadvertently contain a 16 byte data area containing a string; -a leftover from previous RESET messages which were using this during the setup phase. This string serves no purpose in STATE messages, and should no be there. Unfortunately, this data area is delivered to the link monitor framework, where a sanity check catches that it is not a correct domain record, and drops it. It also issues a rate limited warning about the event. Since such events occur much more frequently than anticipated, we now choose to remove the warning in order to not fill the kernel log with useless contents. We also make the sanity check stricter, to further reduce the risk that such data is inavertently admitted. Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
commit 81729810 ("tipc: fix link priority propagation") introduced a compatibility problem between TIPC versions newer than Linux 4.6 and those older than Linux 4.4. In versions later than 4.4, link STATE messages only contain a non-zero link priority value when the sender wants the receiver to change its priority. This has the effect that the receiver resets itself in order to apply the new priority. This works well, and is consistent with the said commit. However, in versions older than 4.4 a valid link priority is present in all sent link STATE messages, leading to cyclic link establishment and reset on the 4.6+ node. We fix this by adding a test that the received value should not only be valid, but also differ from the current value in order to cause the receiving link endpoint to reset. Reported-by: NAmar Nv <amar.nv005@gmail.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Typical NAPI drivers use napi_consume_skb(skb) at TX completion time. This put skb in a percpu special queue, napi_alloc_cache, to get bulk frees. It turns out the queue is not flushed and hits the NAPI_SKB_CACHE_SIZE limit quite often, with skbs that were queued hundreds of usec earlier. I measured this can take ~6000 nsec to perform one flush. __kfree_skb_flush() can be called from two points right now : 1) From net_tx_action(), but only for skbs that were queued to sd->completion_queue. -> Irrelevant for NAPI drivers in normal operation. 2) From net_rx_action(), but only under high stress or if RPS/RFS has a pending action. This patch changes net_rx_action() to perform the flush in all cases and after more urgent operations happened (like kicking remote CPUS for RPS/RFS). Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Mack 提交于
If the cgroup associated with the receiving socket has an eBPF programs installed, run them from ip_output(), ip6_output() and ip_mc_output(). From mentioned functions we have two socket contexts as per 7026b1dd ("netfilter: Pass socket pointer down through okfn()."). We explicitly need to use sk instead of skb->sk here, since otherwise the same program would run multiple times on egress when encap devices are involved, which is not desired in our case. eBPF programs used in this context are expected to either return 1 to let the packet pass, or != 1 to drop them. The programs have access to the skb through bpf_skb_load_bytes(), and the payload starts at the network headers (L3). Note that cgroup_bpf_run_filter() is stubbed out as static inline nop for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if the feature is unused. Signed-off-by: NDaniel Mack <daniel@zonque.org> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Mack 提交于
If the cgroup associated with the receiving socket has an eBPF programs installed, run them from sk_filter_trim_cap(). eBPF programs used in this context are expected to either return 1 to let the packet pass, or != 1 to drop them. The programs have access to the skb through bpf_skb_load_bytes(), and the payload starts at the network headers (L3). Note that cgroup_bpf_run_filter() is stubbed out as static inline nop for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if the feature is unused. Signed-off-by: NDaniel Mack <daniel@zonque.org> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Mack 提交于
This program type is similar to BPF_PROG_TYPE_SOCKET_FILTER, except that it does not allow BPF_LD_[ABS|IND] instructions and hooks up the bpf_skb_load_bytes() helper. Programs of this type will be attached to cgroups for network filtering and accounting. Signed-off-by: NDaniel Mack <daniel@zonque.org> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 11月, 2016 8 次提交
-
-
由 Roman Mashak 提交于
Should pass valid filter handle, not the netlink flags. Fixes: 30a391a1 ("net sched filters: pass netlink message flags in event notification") Signed-off-by: NRoman Mashak <mrv@mojatatu.com> Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Reported-by: NCong Wang <xiyou.wangcong@gmail.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Fainelli 提交于
PHY drivers should be able to rely on the caller of {get,set}_tunable to have acquired the PHY device mutex, in order to both serialize against concurrent calls of these functions, but also against PHY state machine changes. All ethtool PHY-level functions do this, except {get,set}_tunable, so we make them consistent here as well. We need to update the Microsemi PHY driver in the same commit to avoid introducing either deadlocks, or lack of proper locking. Fixes: 968ad9da ("ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE") Fixes: 310d9ad5 ("net: phy: Add downshift get/set support in Microsemi PHYs driver") Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Reviewed-by: NAllan W. Nielsen <allan.nielsen@microsemi.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Roi Dayan 提交于
Some HWs need the VF driver to put part of the packet headers on the TX descriptor so the e-switch can do proper matching and steering. The supported modes: none, link, network, transport. Signed-off-by: NRoi Dayan <roid@mellanox.com> Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Or Gerlitz 提交于
Some drivers would need to check few internal matters for that. To be used in downstream mlx5 commit. Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
In commit 2331ccc5 ("tcp: enhance tcp collapsing"), we made a first step allowing copying right skb to left skb head. Since all skbs in socket write queue are headless (but possibly the very first one), this strategy often does not work. This patch extends tcp_collapse_retrans() to perform frag shifting, thanks to skb_shift() helper. This helper needs to not BUG on non headless skbs, as callers are ok with that. Tested: Following packetdrill test now passes : 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> +.100 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 +0 write(4, ..., 200) = 200 +0 > P. 1:201(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 201:401(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 401:601(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 601:801(200) ack 1 +.001 write(4, ..., 200) = 200 +0 > P. 801:1001(200) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1001:1101(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1101:1201(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1201:1301(100) ack 1 +.001 write(4, ..., 100) = 100 +0 > P. 1301:1401(100) ack 1 +.099 < . 1:1(0) ack 201 win 257 +.001 < . 1:1(0) ack 201 win 257 <nop,nop,sack 1001:1401> +0 > P. 201:1001(800) ack 1 Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
In commits 93821778 ("udp: Fix rcv socket locking") and f7ad74fe ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb") UDP backlog handlers were renamed, but UDPlite was forgotten. This leads to crashes if UDPlite header is pulled twice, which happens starting from commit e6afc8ac ("udp: remove headers from UDP packets before queueing") Bug found by syzkaller team, thanks a lot guys ! Note that backlog use in UDP/UDPlite is scheduled to be removed starting from linux-4.10, so this patch is only needed up to linux-4.9 Fixes: 93821778 ("udp: Fix rcv socket locking") Fixes: f7ad74fe ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb") Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NAndrey Konovalov <andreyknvl@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
When an ipv6 address has the tentative flag set, it can't be used as source for egress traffic, while the associated route, if any, can be looked up and even stored into some dst_cache. In the latter scenario, the source ipv6 address selected and stored in the cache is most probably wrong (e.g. with link-local scope) and the entity using the dst_cache will experience lack of ipv6 connectivity until said cache is cleared or invalidated. Overall this may cause lack of connectivity over most IPv6 tunnels (comprising geneve and vxlan), if the first egress packet reaches the tunnel before the DaD is completed for the used ipv6 address. This patch bumps a new genid after that the IFA_F_TENTATIVE flag is cleared, so that dst_cache will be invalidated on next lookup and ipv6 connectivity restored. Fixes: 0c1d70af ("net: use dst_cache for vxlan device") Fixes: 468dfffc ("geneve: add dst caching support") Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stefan Hajnoczi 提交于
The VMware VMCI transport supports loopback inside virtual machines. This patch implements loopback for virtio-vsock. Flow control is handled by the virtio-vsock protocol as usual. The sending process stops transmitting on a connection when the peer's receive buffer space is exhausted. Cathy Avery <cavery@redhat.com> noticed this difference between VMCI and virtio-vsock when a test case using loopback failed. Although loopback isn't the main point of AF_VSOCK, it is useful for testing and virtio-vsock must match VMCI semantics so that userspace programs run regardless of the underlying transport. My understanding is that loopback is not supported on the host side with VMCI. Follow that by implementing it only in the guest driver, not the vhost host driver. Cc: Jorgen Hansen <jhansen@vmware.com> Reported-by: NCathy Avery <cavery@redhat.com> Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 11月, 2016 2 次提交
-
-
由 WANG Cong 提交于
This reverts commit 7c6ae610, because l2tp_xmit_skb() never returns NET_XMIT_CN, it ignores the return value of l2tp_xmit_core(). Cc: Gao Feng <gfree.wind@gmail.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Zhang Shengju 提交于
For RT netlink, calcit() function should return the minimal size for netlink dump message. This will make sure that dump message for every network device can be stored. Currently, rtnl_calcit() function doesn't account the size of header of netlink message, this patch will fix it. Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 11月, 2016 3 次提交
-
-
由 Oliver Hartkopp 提交于
Since commit 6f3b911d ("can: bcm: add support for CAN FD frames") the CAN broadcast manager supports CAN and CAN FD data frames. As these data frames are embedded in struct can[fd]_frames which have a different length the access to the provided array of CAN frames became dependend of op->cfsiz. By using a struct canfd_frame pointer for the array of CAN frames the new offset calculation based on op->cfsiz was accidently applied to CAN FD frame element lengths. This fix makes the pointer to the arrays of the different CAN frame types a void pointer so that the offset calculation in bytes accesses the correct CAN frame elements. Reference: http://marc.info/?l=linux-netdev&m=147980658909653Reported-by: NAndrey Konovalov <andreyknvl@google.com> Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net> Tested-by: NAndrey Konovalov <andreyknvl@google.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
-
由 Johan Hedberg 提交于
The hci_get_route() API is used to look up local HCI devices, however so far it has been incapable of dealing with anything else than the public address of HCI devices. This completely breaks with LE-only HCI devices that do not come with a public address, but use a static random address instead. This patch exteds the hci_get_route() API with a src_type parameter that's used for comparing with the right address of each HCI device. Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com> Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
-
由 Eric Dumazet 提交于
Andre Noll reported panics after my recent fix (commit 34fad54c "net: __skb_flow_dissect() must cap its return value") After some more headaches, Alexander root caused the problem to init_default_flow_dissectors() being called too late, in case a network driver like IGB is not a module and receives DHCP message very early. Fix is to call init_default_flow_dissectors() much earlier, as it is a core infrastructure and does not depend on another kernel service. Fixes: 06635a35 ("flow_dissect: use programable dissector in skb_flow_dissect and friends") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NAndre Noll <maan@tuebingen.mpg.de> Diagnosed-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 11月, 2016 2 次提交
-
-
由 Florian Westphal 提交于
The undo_cwnd fallback in the stack doubles cwnd based on ssthresh, which un-does reno halving behaviour. It seems more appropriate to let congctl algorithms pair .ssthresh and .undo_cwnd properly. Add a 'tcp_reno_undo_cwnd' function and wire it up for all congestion algorithms that used to rely on the fallback. Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
congestion control algorithms that do not halve cwnd in their .ssthresh should provide a .cwnd_undo rather than rely on current fallback which assumes reno halving (and thus doubles the cwnd). All of these do 'something else' in their .ssthresh implementation, thus store the cwnd on loss and provide .undo_cwnd to restore it again. A followup patch will remove the fallback and all algorithms will need to provide a .cwnd_undo function. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-