- 15 8月, 2014 7 次提交
-
-
由 Johannes Berg 提交于
We currently track the QoS capability twice: for all peer stations in the WLAN_STA_WME flag, and for any clients associated to an AP interface separately for drivers in the sta->sta.wme field. Remove the WLAN_STA_WME flag and track the capability only in the driver-visible field, getting rid of the limitation that the field is only valid in AP mode. Reviewed-by: NArik Nemtsov <arik@wizery.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Thomas Graf 提交于
Silences the following sparse warnings: net/netlink/af_netlink.c:2926:21: warning: context imbalance in 'netlink_seq_start' - wrong count at exit net/netlink/af_netlink.c:2972:13: warning: context imbalance in 'netlink_seq_stop' - unexpected unlock Signed-off-by: NThomas Graf <tgraf@suug.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neal Cardwell 提交于
Fix TCP FRTO logic so that it always notices when snd_una advances, indicating that any RTO after that point will be a new and distinct loss episode. Previously there was a very specific sequence that could cause FRTO to fail to notice a new loss episode had started: (1) RTO timer fires, enter FRTO and retransmit packet 1 in write queue (2) receiver ACKs packet 1 (3) FRTO sends 2 more packets (4) RTO timer fires again (should start a new loss episode) The problem was in step (3) above, where tcp_process_loss() returned early (in the spot marked "Step 2.b"), so that it never got to the logic to clear icsk_retransmits. Thus icsk_retransmits stayed non-zero. Thus in step (4) tcp_enter_loss() would see the non-zero icsk_retransmits, decide that this RTO is not a new episode, and decide not to cut ssthresh and remember the current cwnd and ssthresh for undo. There were two main consequences to the bug that we have observed. First, ssthresh was not decreased in step (4). Second, when there was a series of such FRTO (1-4) sequences that happened to be followed by an FRTO undo, we would restore the cwnd and ssthresh from before the entire series started (instead of the cwnd and ssthresh from before the most recent RTO). This could result in cwnd and ssthresh being restored to values much bigger than the proper values. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Fixes: e33099f9 ("tcp: implement RFC5682 F-RTO") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
tcp_tw_recycle heavily relies on tcp timestamps to build a per-host ordering of incoming connections and teardowns without the need to hold state on a specific quadruple for TCP_TIMEWAIT_LEN, but only for the last measured RTO. To do so, we keep the last seen timestamp in a per-host indexed data structure and verify if the incoming timestamp in a connection request is strictly greater than the saved one during last connection teardown. Thus we can verify later on that no old data packets will be accepted by the new connection. During moving a socket to time-wait state we already verify if timestamps where seen on a connection. Only if that was the case we let the time-wait socket expire after the RTO, otherwise normal TCP_TIMEWAIT_LEN will be used. But we don't verify this on incoming SYN packets. If a connection teardown was less than TCP_PAWS_MSL seconds in the past we cannot guarantee to not accept data packets from an old connection if no timestamps are present. We should drop this SYN packet. This patch closes this loophole. Please note, this patch does not make tcp_tw_recycle in any way more usable but only adds another safety check: Sporadic drops of SYN packets because of reordering in the network or in the socket backlog queues can happen. Users behing NAT trying to connect to a tcp_tw_recycle enabled server can get caught in blackholes and their connection requests may regullary get dropped because hosts behind an address translator don't have synchronized tcp timestamp clocks. tcp_tw_recycle cannot work if peers don't have tcp timestamps enabled. In general, use of tcp_tw_recycle is disadvised. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neal Cardwell 提交于
Make sure we use the correct address-family-specific function for handling MTU reductions from within tcp_release_cb(). Previously AF_INET6 sockets were incorrectly always using the IPv6 code path when sometimes they were handling IPv4 traffic and thus had an IPv4 dst. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Diagnosed-by: NWillem de Bruijn <willemb@google.com> Fixes: 563d34d0 ("tcp: dont drop MTU reduction indications") Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Shmulik Ladkani 提交于
As of 4fddbf5d ("sit: strictly restrict incoming traffic to tunnel link device"), when looking up a tunnel, tunnel's underlying interface (t->parms.link) is verified to match incoming traffic's ingress device. However the comparison was incorrectly based on skb->dev->iflink. Instead, dev->ifindex should be used, which correctly represents the interface from which the IP stack hands the ipip6 packets. This allows setting up sit tunnels bound to vlan interfaces (otherwise incoming ipip6 traffic on the vlan interface was dropped due to ipip6_tunnel_lookup match failure). Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andrey Vagin 提交于
We don't know right timestamp for repaired skb-s. Wrong RTT estimations isn't good, because some congestion modules heavily depends on it. This patch adds the TCPCB_REPAIRED flag, which is included in TCPCB_RETRANS. Thanks to Eric for the advice how to fix this issue. This patch fixes the warning: [ 879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380() [ 879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1 [ 879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 879.568177] 0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2 [ 879.568776] 0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80 [ 879.569386] ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc [ 879.569982] Call Trace: [ 879.570264] [<ffffffff817aa2d2>] dump_stack+0x4d/0x66 [ 879.570599] [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0 [ 879.570935] [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20 [ 879.571292] [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380 [ 879.571614] [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710 [ 879.571958] [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370 [ 879.572315] [<ffffffff81657459>] release_sock+0x89/0x1d0 [ 879.572642] [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860 [ 879.573000] [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80 [ 879.573352] [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40 [ 879.573678] [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20 [ 879.574031] [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0 [ 879.574393] [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b [ 879.574730] ---[ end trace a17cbc38eb8c5c00 ]--- v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit, where it's set for other skb-s. Fixes: 431a9124 ("tcp: timestamp SYN+DATA messages") Fixes: 740b0f18 ("tcp: switch rtt estimations to usec resolution") Cc: Eric Dumazet <edumazet@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: NAndrey Vagin <avagin@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 8月, 2014 6 次提交
-
-
由 Willem de Bruijn 提交于
Bytestream timestamps are correlated with a single byte in the skbuff, recorded in skb_shinfo(skb)->tskey. When fragmenting skbuffs, ensure that the tskey is set for the fragment in which the tskey falls (seqno <= tskey < end_seqno). The original implementation did not address fragmentation in tcp_fragment or tso_fragment. Add code to inspect the sequence numbers and move both tskey and the relevant tx_flags if necessary. Reported-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
ACK timestamps are generated in tcp_clean_rtx_queue. The TSO datapath can break out early, causing the timestamp code to be skipped. Move the code up before the break. Reported-by: NDavid S. Miller <davem@davemloft.net> Also fix a boundary condition: tp->snd_una is the next unacknowledged byte and between tests inclusive (a <= b <= c), so generate a an ACK timestamp if (prior_snd_una <= tskey <= tp->snd_una - 1). Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Maks Naumov 提交于
Signed-off-by: NMaks Naumov <maksqwe1@ukr.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
b67bfe0d (hlist: drop the node parameter from iterators) dropped the node parameter from iterators which lec_tbl_walk() was using to iterate the list. Signed-off-by: NChas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
One should not call blocking primitives inside a wait loop, since both require task_struct::state to sleep, so the inner will destroy the outer state. sigd_enq() will possibly sleep for alloc_skb(). Move sigd_enq() before prepare_to_wait() to avoid sleeping while waiting interruptibly. You do not actually need to call sigd_enq() after the initial prepare_to_wait() because we test the termination condition before calling schedule(). Based on suggestions from Peter Zijlstra. Signed-off-by: NChas Williams <chas@cmf.n4rl.navy.mil> Acked-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christoph Jaeger 提交于
ovs_vport_alloc() bails out without freeing the memory 'vport' points to. Picked up by Coverity - CID 1230503. Fixes: 5cd667b0 ("openvswitch: Allow each vport to have an array of 'port_id's.") Signed-off-by: NChristoph Jaeger <cj@linux.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 8月, 2014 1 次提交
-
-
由 Vlad Yasevich 提交于
Currently the functionality to untag traffic on input resides as part of the vlan module and is build only when VLAN support is enabled in the kernel. When VLAN is disabled, the function vlan_untag() turns into a stub and doesn't really untag the packets. This seems to create an interesting interaction between VMs supporting checksum offloading and some network drivers. There are some drivers that do not allow the user to change tx-vlan-offload feature of the driver. These drivers also seem to assume that any VLAN-tagged traffic they transmit will have the vlan information in the vlan_tci and not in the vlan header already in the skb. When transmitting skbs that already have tagged data with partial checksum set, the checksum doesn't appear to be updated correctly by the card thus resulting in a failure to establish TCP connections. The following is a packet trace taken on the receiver where a sender is a VM with a VLAN configued. The host VM is running on doest not have VLAN support and the outging interface on the host is tg3: 10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243, offset 0, flags [DF], proto TCP (6), length 60) 10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect -> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val 4294837885 ecr 0,nop,wscale 7], length 0 10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244, offset 0, flags [DF], proto TCP (6), length 60) 10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect -> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val 4294838888 ecr 0,nop,wscale 7], length 0 This connection finally times out. I've only access to the TG3 hardware in this configuration thus have only tested this with TG3 driver. There are a lot of other drivers that do not permit user changes to vlan acceleration features, and I don't know if they all suffere from a similar issue. The patch attempt to fix this another way. It moves the vlan header stipping code out of the vlan module and always builds it into the kernel network core. This way, even if vlan is not supported on a virtualizatoin host, the virtual machines running on top of such host will still work with VLANs enabled. CC: Patrick McHardy <kaber@trash.net> CC: Nithin Nayak Sujir <nsujir@broadcom.com> CC: Michael Chan <mchan@broadcom.com> CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com> Acked-by: NJiri Pirko <jiri@resnulli.us> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 8月, 2014 3 次提交
-
-
由 Ilya Dryomov 提交于
Determining ->last_piece based on the value of ->page_offset + length is incorrect because length here is the length of the entire message. ->last_piece set to false even if page array data item length is <= PAGE_SIZE, which results in invalid length passed to ceph_tcp_{send,recv}page() and causes various asserts to fire. # cat pages-cursor-init.sh #!/bin/bash rbd create --size 10 --image-format 2 foo FOO_DEV=$(rbd map foo) dd if=/dev/urandom of=$FOO_DEV bs=1M &>/dev/null rbd snap create foo@snap rbd snap protect foo@snap rbd clone foo@snap bar # rbd_resize calls librbd rbd_resize(), size is in bytes ./rbd_resize bar $(((4 << 20) + 512)) rbd resize --size 10 bar BAR_DEV=$(rbd map bar) # trigger a 512-byte copyup -- 512-byte page array data item dd if=/dev/urandom of=$BAR_DEV bs=1M count=1 seek=5 The problem exists only in ceph_msg_data_pages_cursor_init(), ceph_msg_data_pages_advance() does the right thing. The size_t cast is unnecessary. Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@redhat.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Jiri Benc 提交于
Commit 1d8faf48 ("net/core: Add VF link state control") added new attribute to IFLA_VF_INFO group in rtnl_fill_ifinfo but did not adjust size of the allocated memory in if_nlmsg_size/rtnl_vfinfo_size. As the result, we may trigger warnings in rtnl_getlink and similar functions when many VF links are enabled, as the information does not fit into the allocated skb. Fixes: 1d8faf48 ("net/core: Add VF link state control") Reported-by: NYulong Pei <ypei@redhat.com> Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Niv Yehezkel 提交于
Since fib_lookup cannot return ESRCH no longer, checking for this error code is no longer neccesary. Signed-off-by: NNiv Yehezkel <executerx@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 8月, 2014 8 次提交
-
-
由 Julia Lawall 提交于
Convert a zero return value on error to a negative one, as returned elsewhere in the function. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier ret; expression e1,e2; @@ ( if (\(ret < 0\|ret != 0\)) { ... return ret; } | ret = 0 ) ... when != ret = e1 when != &ret *if(...) { ... when != ret = e2 when forall return ret; } // </smpl> Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
Eric Dumazet reports that getsockopt() or setsockopt() sometimes returns -EINTR instead of -ENOPROTOOPT, causing headaches to application developers. This patch replaces all the mutex_lock_interruptible() by mutex_lock() in the netfilter tree, as there is no reason we should sleep for a long time there. Reported-by: NEric Dumazet <edumazet@google.com> Suggested-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Acked-by: NJulian Anastasov <ja@ssi.bg>
-
由 Pablo Neira Ayuso 提交于
Fix possible replacement of the per-cpu chain counters by null pointer when updating an existing chain in the commit path. Reported-by: NMatteo Croce <technoboy85@gmail.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
This should happen once the element has been effectively released in the commit path, not before. This fixes a possible chain refcount leak if the transaction is aborted. Reported-by: NThomas Graf <tgraf@suug.ch> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Daniel Borkmann 提交于
netlink doesn't set any network header offset thus when the skb is being passed to tap devices via dev_queue_xmit_nit(), it emits klog false positives due to it being unset like: ... [ 124.990397] protocol 0000 is buggy, dev nlmon0 [ 124.990411] protocol 0000 is buggy, dev nlmon0 ... So just reset the network header before passing to the device; for packet sockets that just means nothing will change - mac and net offset hold the same value just as before. Reported-by: NMarcel Holtmann <marcel@holtmann.org> Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jean Sacren 提交于
The header multicast.h was included twice, so delete one of them. Signed-off-by: NJean Sacren <sakiwit@gmail.com> Cc: Marek Lindner <mareklindner@neomailbox.ch> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Antonio Quartulli <antonio@meshcoding.com> Cc: b.a.t.m.a.n@lists.open-mesh.org Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jean Sacren 提交于
The #include headers net/genetlink.h and linux/genetlink.h both were included twice, so delete each of the duplicate. Signed-off-by: NJean Sacren <sakiwit@gmail.com> Cc: Pravin Shelar <pshelar@nicira.com> Cc: dev@openvswitch.org Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Geert Uytterhoeven 提交于
Change config symbol 6LOWPAN from type bool to type tristate, so 6LoWPAN can be built modular, just like IPV6 Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Acked-by: NMarcel Holtmann <marcel@holtmann.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 8月, 2014 5 次提交
-
-
由 Thomas Graf 提交于
Although RCU protection would be possible during diag dump, doing so allows for concurrent table mutations which can render the in-table offset between individual Netlink messages invalid and thus cause legitimate sockets to be skipped in the dump. Since the diag dump is relatively low volume and consistency is more important than performance, the table mutex is held during dump. Reported-by: NAndrey Wagin <avagin@gmail.com> Signed-off-by: NThomas Graf <tgraf@suug.ch> Fixes: e341694e ("netlink: Convert netlink_lookup() to use RCU protected hash table") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ken Helias 提交于
All other add functions for lists have the new item as first argument and the position where it is added as second argument. This was changed for no good reason in this function and makes using it unnecessary confusing. The name was changed to hlist_add_behind() to cause unconverted code to generate a compile error instead of using the wrong parameter order. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NKen Helias <kenhelias@firemail.de> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [intel driver bits] Cc: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dmitry Popov 提交于
Since a8afca03 (tcp: md5: protects md5sig_info with RCU) tcp_md5_do_lookup doesn't require socket lock, rcu_read_lock is enough. Therefore socket lock is no longer required for tcp_v{4,6}_inbound_md5_hash too, so we can move these calls (wrapped with rcu_read_{,un}lock) before bh_lock_sock: from tcp_v{4,6}_do_rcv to tcp_v{4,6}_rcv. Signed-off-by: NDmitry Popov <ixaphire@qrator.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
A set of small fixes pointed out just after the merge: - make tcp_tx_timestamp static - make tcp_gso_tstamp static - use before() to compare TCP seqno, instead of cast to u64 - add tstamp to tx_flags in GSO, instead of overwrite tx_flags - record skb_shinfo(skb)->tskey for all timestamps, also HW. - optimization in tcp_tx_timestamp: call sock_tx_timestamp only if a tstamp option is set. Signed-off-by: NWillem de Bruijn <willemb@google.com> Fixes: 4ed2d765 ("net-timestamp: TCP timestamping") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
sock_tx_timestamp() should not ignore initial *tx_flags value, as TCP stack can store SKBTX_SHARED_FRAG in it. Also first argument (struct sock *) can be const. Signed-off-by: NEric Dumazet <edumazet@google.com> Fixes: 4ed2d765 ("net-timestamp: TCP timestamping") Cc: Willem de Bruijn <willemb@google.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 8月, 2014 10 次提交
-
-
由 Eric Dumazet 提交于
Dave reported following splat, caused by improper use of IP_INC_STATS_BH() in process context. BUG: using __this_cpu_add() in preemptible [00000000] code: trinity-c117/14551 caller is __this_cpu_preempt_check+0x13/0x20 CPU: 3 PID: 14551 Comm: trinity-c117 Not tainted 3.16.0+ #33 ffffffff9ec898f0 0000000047ea7e23 ffff88022d32f7f0 ffffffff9e7ee207 0000000000000003 ffff88022d32f818 ffffffff9e397eaa ffff88023ee70b40 ffff88022d32f970 ffff8801c026d580 ffff88022d32f828 ffffffff9e397ee3 Call Trace: [<ffffffff9e7ee207>] dump_stack+0x4e/0x7a [<ffffffff9e397eaa>] check_preemption_disabled+0xfa/0x100 [<ffffffff9e397ee3>] __this_cpu_preempt_check+0x13/0x20 [<ffffffffc0839872>] sctp_packet_transmit+0x692/0x710 [sctp] [<ffffffffc082a7f2>] sctp_outq_flush+0x2a2/0xc30 [sctp] [<ffffffff9e0d985c>] ? mark_held_locks+0x7c/0xb0 [<ffffffff9e7f8c6d>] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [<ffffffffc082b99a>] sctp_outq_uncork+0x1a/0x20 [sctp] [<ffffffffc081e112>] sctp_cmd_interpreter.isra.23+0x1142/0x13f0 [sctp] [<ffffffffc081c86b>] sctp_do_sm+0xdb/0x330 [sctp] [<ffffffff9e0b8f1b>] ? preempt_count_sub+0xab/0x100 [<ffffffffc083b350>] ? sctp_cname+0x70/0x70 [sctp] [<ffffffffc08389ca>] sctp_primitive_ASSOCIATE+0x3a/0x50 [sctp] [<ffffffffc083358f>] sctp_sendmsg+0x88f/0xe30 [sctp] [<ffffffff9e0d673a>] ? lock_release_holdtime.part.28+0x9a/0x160 [<ffffffff9e0d62ce>] ? put_lock_stats.isra.27+0xe/0x30 [<ffffffff9e73b624>] inet_sendmsg+0x104/0x220 [<ffffffff9e73b525>] ? inet_sendmsg+0x5/0x220 [<ffffffff9e68ac4e>] sock_sendmsg+0x9e/0xe0 [<ffffffff9e1c0c09>] ? might_fault+0xb9/0xc0 [<ffffffff9e1c0bae>] ? might_fault+0x5e/0xc0 [<ffffffff9e68b234>] SYSC_sendto+0x124/0x1c0 [<ffffffff9e0136b0>] ? syscall_trace_enter+0x250/0x330 [<ffffffff9e68c3ce>] SyS_sendto+0xe/0x10 [<ffffffff9e7f9be4>] tracesys+0xdd/0xe2 This is a followup of commits f1d8cba6 ("inet: fix possible seqlock deadlocks") and 7f88c6b2 ("ipv6: fix possible seqlock deadlock in ip6_finish_output2") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: NDave Jones <davej@redhat.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Toshiaki Makita 提交于
Now bridge ports can be non-promiscuous, vlan_vid_add() is no longer an unnecessary operation. Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
Add SOF_TIMESTAMPING_TX_ACK, a request for a tstamp when the last byte in the send() call is acknowledged. It implements the feature for TCP. The timestamp is generated when the TCP socket cumulative ACK is moved beyond the tracked seqno for the first time. The feature ignores SACK and FACK, because those acknowledge the specific byte, but not necessarily the entire contents of the buffer up to that byte. Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
TCP timestamping extends SO_TIMESTAMPING to bytestreams. Bytestreams do not have a 1:1 relationship between send() buffers and network packets. The feature interprets a send call on a bytestream as a request for a timestamp for the last byte in that send() buffer. The choice corresponds to a request for a timestamp when all bytes in the buffer have been sent. That assumption depends on in-order kernel transmission. This is the common case. That said, it is possible to construct a traffic shaping tree that would result in reordering. The guarantee is strong, then, but not ironclad. This implementation supports send and sendpages (splice). GSO replaces one large packet with multiple smaller packets. This patch also copies the option into the correct smaller packet. This patch does not yet support timestamping on data in an initial TCP Fast Open SYN, because that takes a very different data path. If ID generation in ee_data is enabled, bytestream timestamps return a byte offset, instead of the packet counter for datagrams. The implementation supports a single timestamp per packet. It silenty replaces requests for previous timestamps. To avoid missing tstamps, flush the tcp queue by disabling Nagle, cork and autocork. Missing tstamps can be detected by offset when the ee_data ID is enabled. Implementation details: - On GSO, the timestamping code can be included in the main loop. I moved it into its own loop to reduce the impact on the common case to a single branch. - To avoid leaking the absolute seqno to userspace, the offset returned in ee_data must always be relative. It is an offset between an skb and sk field. The first is always set (also for GSO & ACK). The second must also never be uninitialized. Only allow the ID option on sockets in the ESTABLISHED state, for which the seqno is available. Never reset it to zero (instead, move it to the current seqno when reenabling the option). Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
Kernel transmit latency is often incurred in the packet scheduler. Introduce a new timestamp on transmission just before entering the scheduler. When data travels through multiple devices (bonding, tunneling, ...) each device will export an individual timestamp. Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
Datagrams timestamped on transmission can coexist in the kernel stack and be reordered in packet scheduling. When reading looped datagrams from the socket error queue it is not always possible to unique correlate looped data with original send() call (for application level retransmits). Even if possible, it may be expensive and complex, requiring packet inspection. Introduce a data-independent ID mechanism to associate timestamps with send calls. Pass an ID alongside the timestamp in field ee_data of sock_extended_err. The ID is a simple 32 bit unsigned int that is associated with the socket and incremented on each send() call for which software tx timestamp generation is enabled. The feature is enabled only if SOF_TIMESTAMPING_OPT_ID is set, to avoid changing ee_data for existing applications that expect it 0. The counter is reset each time the flag is reenabled. Reenabling does not change the ID of already submitted data. It is possible to receive out of order IDs if the timestamp stream is not quiesced first. Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
sk_flags is reaching its limit. New timestamping options will not fit. Move all of them into a new field sk->sk_tsflags. Added benefit is that this removes boilerplate code to convert between SOF_TIMESTAMPING_.. and SOCK_TIMESTAMPING_.. in getsockopt/setsockopt. SOCK_TIMESTAMPING_RX_SOFTWARE is also used to toggle the receive timestamp logic (netstamp_needed). That can be simplified and this last key removed, but will leave that for a separate patch. Signed-off-by: NWillem de Bruijn <willemb@google.com> ---- The u16 in sock can be moved into a 16-bit hole below sk_gso_max_segs, though that scatters tstamp fields throughout the struct. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
Applications that request kernel tx timestamps with SO_TIMESTAMPING read timestamps as recvmsg() ancillary data. The response is defined implicitly as timespec[3]. 1) define struct scm_timestamping explicitly and 2) add support for new tstamp types. On tx, scm_timestamping always accompanies a sock_extended_err. Define previously unused field ee_info to signal the type of ts[0]. Introduce SCM_TSTAMP_SND to define the existing behavior. The reception path is not modified. On rx, no struct similar to sock_extended_err is passed along with SCM_TIMESTAMPING. Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neal Cardwell 提交于
This commit reduces spurious retransmits due to apparent SACK reneging by only reacting to SACK reneging that persists for a short delay. When a sequence space hole at snd_una is filled, some TCP receivers send a series of ACKs as they apparently scan their out-of-order queue and cumulatively ACK all the packets that have now been consecutiveyly received. This is essentially misbehavior B in "Misbehaviors in TCP SACK generation" ACM SIGCOMM Computer Communication Review, April 2011, so we suspect that this is from several common OSes (Windows 2000, Windows Server 2003, Windows XP). However, this issue has also been seen in other cases, e.g. the netdev thread "TCP being hoodwinked into spurious retransmissions by lack of timestamps?" from March 2014, where the receiver was thought to be a BSD box. Since snd_una would temporarily be adjacent to a previously SACKed range in these scenarios, this receiver behavior triggered the Linux SACK reneging code path in the sender. This led the sender to clear the SACK scoreboard, enter CA_Loss, and spuriously retransmit (potentially) every packet from the entire write queue at line rate just a few milliseconds before the ACK for each packet arrives at the sender. To avoid such situations, now when a sender sees apparent reneging it does not yet retransmit, but rather adjusts the RTO timer to give the receiver a little time (max(RTT/2, 10ms)) to send us some more ACKs that will restore sanity to the SACK scoreboard. If the reneging persists until this RTO then, as before, we clear the SACK scoreboard and enter CA_Loss. A 10ms delay tolerates a receiver sending such a stream of ACKs at 56Kbit/sec. And to allow for receivers with slower or more congested paths, we wait for at least RTT/2. We validated the resulting max(RTT/2, 10ms) delay formula with a mix of North American and South American Google web server traffic, and found that for ACKs displaying transient reneging: (1) 90% of inter-ACK delays were less than 10ms (2) 99% of inter-ACK delays were less than RTT/2 In tests on Google web servers this commit reduced reneging events by 75%-90% (as measured by the TcpExtTCPSACKReneging counter), without any measurable impact on latency for user HTTP and SPDY requests. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steve Wise 提交于
In svc_rdma_accept(), if rdma_create_qp() fails, there is useless logic to try and call rdma_create_qp() again with reduced sge depths. The assumption, I guess, was that perhaps the initial sge depths chosen were too big. However they initial depths are selected based on the rdma device attribute max_sge returned from ib_query_device(). If rdma_create_qp() fails, it would not be because the max_send_sge and max_recv_sge values passed in exceed the device's max. So just remove this code. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-