- 28 4月, 2016 3 次提交
-
-
由 Eric Dumazet 提交于
Rename NET_INC_STATS_BH() to __NET_INC_STATS() and NET_ADD_STATS_BH() to __NET_ADD_STATS() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Rename ICMP6_INC_STATS_BH() to __ICMP6_INC_STATS() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Rename DCCP_INC_STATS_BH() to __DCCP_INC_STATS() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 4月, 2016 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
The socket is either locked if we hold the slock spin_lock for lock_sock_fast and unlock_sock_fast or we own the lock (sk_lock.owned != 0). Check for this and at the same time improve that the current thread/cpu is really holding the lock. Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 4月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
When a SYNFLOOD targets a non SO_REUSEPORT listener, multiple cpus contend on sk->sk_refcnt and sk->sk_wmem_alloc changes. By letting listeners use SOCK_RCU_FREE infrastructure, we can relax TCP_LISTEN lookup rules and avoid touching sk_refcnt Note that we still use SLAB_DESTROY_BY_RCU rules for other sockets, only listeners are impacted by this change. Peak performance under SYNFLOOD is increased by ~33% : On my test machine, I could process 3.2 Mpps instead of 2.4 Mpps Most consuming functions are now skb_set_owner_w() and sock_wfree() contending on sk->sk_wmem_alloc when cooking SYNACK and freeing them. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 2月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
Ilya reported following lockdep splat: kernel: ========================= kernel: [ BUG: held lock freed! ] kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted kernel: ------------------------- kernel: swapper/5/0 is freeing memory ffff880035c9d200-ffff880035c9dbff, with a lock still held there! kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at: [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0 kernel: 4 locks held by swapper/5/0: kernel: #0: (rcu_read_lock){......}, at: [<ffffffff8169ef6b>] netif_receive_skb_internal+0x4b/0x1f0 kernel: #1: (rcu_read_lock){......}, at: [<ffffffff816e977f>] ip_local_deliver_finish+0x3f/0x380 kernel: #2: (slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>] sk_clone_lock+0x19b/0x440 kernel: #3: (&(&queue->rskq_lock)->rlock){+.-...}, at: [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0 To properly fix this issue, inet_csk_reqsk_queue_add() needs to return to its callers if the child as been queued into accept queue. We also need to make sure listener is still there before calling sk->sk_data_ready(), by holding a reference on it, since the reference carried by the child can disappear as soon as the child is put on accept queue. Reported-by: NIlya Dryomov <idryomov@gmail.com> Fixes: ebb516af ("tcp/dccp: fix race at listener dismantle phase") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 2月, 2016 2 次提交
-
-
由 Craig Gallek 提交于
This is a preliminary step to allow fast socket lookup of SO_REUSEPORT groups. Doing so with a BPF filter will require access to the skb in question. This change plumbs the skb (and offset to payload data) through the call stack to the listening socket lookup implementations where it will be used in a following patch. Signed-off-by: NCraig Gallek <kraig@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Craig Gallek 提交于
In order to support fast lookups for TCP sockets with SO_REUSEPORT, the function that adds sockets to the listening hash set needs to be able to check receive address equality. Since this equality check is different for IPv4 and IPv6, we will need two different socket hashing functions. This patch adds inet6_hash identical to the existing inet_hash function and updates the appropriate references. A following patch will differentiate the two by passing different comparison functions to __inet_hash. Additionally, in order to use the IPv6 address equality function from inet6_hashtables (which is compiled as a built-in object when IPv6 is enabled) it also needs to be in a built-in object file as well. This moves ipv6_rcv_saddr_equal into inet_hashtables to accomplish this. Signed-off-by: NCraig Gallek <kraig@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 12月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
While testing the np->opt RCU conversion, I found that UDP/IPv6 was using a mixture of xchg() and sk_dst_lock to protect concurrent changes to sk->sk_dst_cache, leading to possible corruptions and crashes. ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest way to fix the mess is to remove sk_dst_lock completely, as we did for IPv4. __ip6_dst_store() and ip6_dst_store() share same implementation. sk_setup_caps() being called with socket lock being held or not, we have to use sk_dst_set() instead of __sk_dst_set() Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);" in ip6_dst_store() before the sk_setup_caps(sk, dst) call. This is because ip6_dst_store() can be called from process context, without any lock held. As soon as the dst is installed in sk->sk_dst_cache, dst can be freed from another cpu doing a concurrent ip6_dst_store() Doing the dst dereference before doing the install is needed to make sure no use after free would trigger. Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 12月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
This patch addresses multiple problems : UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions while socket is not locked : Other threads can change np->opt concurrently. Dmitry posted a syzkaller (http://github.com/google/syzkaller) program desmonstrating use-after-free. Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock() and dccp_v6_request_recv_sock() also need to use RCU protection to dereference np->opt once (before calling ipv6_dup_options()) This patch adds full RCU protection to np->opt Reported-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 11月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
IPv6 request sockets store a pointer to skb containing the SYN packet to be able to transfer it to full blown socket when 3WHS is done (ireq->pktopts -> np->pktoptions) As explained in commit 5e0724d0 ("tcp/dccp: fix hashdance race for passive sessions"), we must transfer the skb only if we won the hashdance race, if multiple cpus receive the 'ack' packet completing 3WHS at the same time. Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets") Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 10月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
Multiple cpus can process duplicates of incoming ACK messages matching a SYN_RECV request socket. This is a rare event under normal operations, but definitely can happen. Only one must win the race, otherwise corruption would occur. To fix this without adding new atomic ops, we use logic in inet_ehash_nolisten() to detect the request was present in the same ehash bucket where we try to insert the new child. If request socket was not found, we have to undo the child creation. This actually removes a spin_lock()/spin_unlock() pair in reqsk_queue_unlink() for the fast path. Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets") Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 10月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
Let's reduce the confusion about inet_csk_reqsk_queue_drop() : In many cases we also need to release reference on request socket, so add a helper to do this, reducing code size and complexity. Fixes: 4bdc3d66 ("tcp/dccp: fix behavior of stale SYN_RECV request sockets") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 10月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
When a TCP/DCCP listener is closed, its pending SYN_RECV request sockets become stale, meaning 3WHS can not complete. But current behavior is wrong : incoming packets finding such stale sockets are dropped. We need instead to cleanup the request socket and perform another lookup : - Incoming ACK will give a RST answer, - SYN rtx might find another listener if available. - We expedite cleanup of request sockets and old listener socket. Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 10月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
inet_reqsk_alloc() is used to allocate a temporary request in order to generate a SYNACK with a cookie. Then later, syncookie validation also uses a temporary request. These paths already took a reference on listener refcount, we can avoid a couple of atomic operations. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 10月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
In this patch, we insert request sockets into TCP/DCCP regular ehash table (where ESTABLISHED and TIMEWAIT sockets are) instead of using the per listener hash table. ACK packets find SYN_RECV pseudo sockets without having to find and lock the listener. In nominal conditions, this halves pressure on listener lock. Note that this will allow for SO_REUSEPORT refinements, so that we can select a listener using cpu/numa affinities instead of the prior 'consistent hash', since only SYN packets will apply this selection logic. We will shrink listen_sock in the following patch to ease code review. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Ying Cai <ycai@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 9月, 2015 3 次提交
-
-
由 Eric Dumazet 提交于
We'll soon no longer hold listener socket lock, these functions do not modify the socket in any way. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Before changing dccp_v6_request_recv_sock() sock argument to const, we need to get rid of security_sk_classify_flow(), and it seems doable by reusing inet6_csk_route_req() helper. We need to add a proto parameter to inet6_csk_route_req(), not assume it is TCP. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
None of these functions need to change the socket, make it const. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 9月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
This is done to make sure we do not change listener socket while sending SYNACK packets while socket lock is not held. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 4月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
[ 3897.923145] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 [ 3897.931025] IP: [<ffffffffa9f27686>] reqsk_timer_handler+0x1a6/0x243 There is a race when reqsk_timer_handler() and tcp_check_req() call inet_csk_reqsk_queue_unlink() on the same req at the same time. Before commit fa76ce73 ("inet: get rid of central tcp/dccp listener timer"), listener spinlock was held and race could not happen. To solve this bug, we change reqsk_queue_unlink() to not assume req must be found, and we return a status, to conditionally release a refcount on the request sock. This also means tcp_check_req() in non fastopen case might or not consume req refcount, so tcp_v6_hnd_req() & tcp_v4_hnd_req() have to properly handle this. (Same remark for dccp_check_req() and its callers) inet_csk_reqsk_queue_drop() is now too big to be inlined, as it is called 4 times in tcp and 3 times in dccp. Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 3月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
dccp_v6_err() can restrict lookups to ehash table, and not to listeners. Note this patch creates the infrastructure, but this means that ICMP messages for request sockets are ignored until complete conversion. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 3月, 2015 2 次提交
-
-
由 Eric Dumazet 提交于
One of the major issue for TCP is the SYNACK rtx handling, done by inet_csk_reqsk_queue_prune(), fired by the keepalive timer of a TCP_LISTEN socket. This function runs for awful long times, with socket lock held, meaning that other cpus needing this lock have to spin for hundred of ms. SYNACK are sent in huge bursts, likely to cause severe drops anyway. This model was OK 15 years ago when memory was very tight. We now can afford to have a timer per request sock. Timer invocations no longer need to lock the listener, and can be run from all cpus in parallel. With following patch increasing somaxconn width to 32 bits, I tested a listener with more than 4 million active request sockets, and a steady SYNFLOOD of ~200,000 SYN per second. Host was sending ~830,000 SYNACK per second. This is ~100 times more what we could achieve before this patch. Later, we will get rid of the listener hash and use ehash instead. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
When request sock are put in ehash table, the whole notion of having a previous request to update dl_next is pointless. Also, following patch will get rid of big purge timer, so we want to delete a request sock without holding listener lock. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 3月, 2015 2 次提交
-
-
由 Eric Dumazet 提交于
We can now use inet_hash() and __inet_hash() instead of private functions. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Intent is to converge IPv4 & IPv6 inet_hash functions to factorize code. IPv4 sockets initialize sk_rcv_saddr and sk_v6_daddr in this patch, thanks to new sk_daddr_set() and sk_rcv_saddr_set() helpers. __inet6_hash can now use sk_ehashfn() instead of a private inet6_sk_ehashfn() and will simply use __inet_hash() in a following patch. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 3月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
listener socket can be used to set net pointer, and will be later used to hold a reference on listener. Add a const qualifier to first argument (struct request_sock_ops *), and factorize all write_pnet(&ireq->ireq_net, sock_net(sk)); Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 3月, 2015 2 次提交
-
-
由 Eric Dumazet 提交于
Before inserting request socks into general hash table, fill their socket family. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
I forgot to update dccp_v6_conn_request() & cookie_v6_check(). They both need to set ireq->ireq_net and ireq->ir_cookie Lets clear ireq->ir_cookie in inet_reqsk_alloc() Signed-off-by: NEric Dumazet <edumazet@google.com> Fixes: 33cf7c90 ("net: add real socket cookies") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 10月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
Commit 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses") added a regression for SO_BINDTODEVICE on IPv6. This is because we still use inet6_iif() which expects that IP6 control block is still at the beginning of skb->cb[] This patch adds tcp_v6_iif() helper and uses it where necessary. Because __inet6_lookup_skb() is used by TCP and DCCP, we add an iif parameter to it. Signed-off-by: NEric Dumazet <edumazet@google.com> Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses") Acked-by: NCong Wang <cwang@twopensource.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 9月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
ipv6_opt_accepted() assumes IP6CB(skb) holds the struct inet6_skb_parm that it needs. Lets not assume this, as TCP stack might use a different place. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 6月, 2014 1 次提交
-
-
由 Octavian Purdila 提交于
Since pktops is only used for IPv6 only and opts is used for IPv4 only, we can move these fields into a union and this allows us to drop the inet6_reqsk_alloc function as after this change it becomes equivalent with inet_reqsk_alloc. This patch also fixes a kmemcheck issue in the IPv6 stack: the flags field was not annotated after a request_sock was allocated. Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 12月, 2013 1 次提交
-
-
由 Hannes Frederic Sowa 提交于
IPV6_PMTU_INTERFACE is the same as IPV6_PMTU_PROBE for ipv6. Add it nontheless for symmetry with IPv4 sockets. Also drop incoming MTU information if this mode is enabled. The additional bit in ipv6_pinfo just eats in the padding behind the bitfield. There are no changes to the layout of the struct at all. Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 12月, 2013 1 次提交
-
-
由 Florent Fourcot 提交于
This patch is following b579035f "ipv6: remove old conditions on flow label sharing" Since there is no reason to restrict a label to a destination, we should not erase the destination value of a socket with the value contained in the flow label storage. This patch allows to really have the same flow label to more than one destination. Signed-off-by: NFlorent Fourcot <florent.fourcot@enst-bretagne.fr> Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 12月, 2013 1 次提交
-
-
由 Steffen Klassert 提交于
FLOWI_FLAG_CAN_SLEEP was used to notify xfrm about the posibility to sleep until the needed states are resolved. This code is gone, so FLOWI_FLAG_CAN_SLEEP is not needed anymore. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 11 10月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
In commit 634fb979 ("inet: includes a sock_common in request_sock") I forgot that the two ports in sock_common do not have same byte order : skc_dport is __be16 (network order), but skc_num is __u16 (host order) So sparse complains because ir_loc_port (mapped into skc_num) is considered as __u16 while it should be __be16 Let rename ir_loc_port to ireq->ir_num (analogy with inet->inet_num), and perform appropriate htons/ntohs conversions. Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 10月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
TCP listener refactoring, part 5 : We want to be able to insert request sockets (SYN_RECV) into main ehash table instead of the per listener hash table to allow RCU lookups and remove listener lock contention. This patch includes the needed struct sock_common in front of struct request_sock This means there is no more inet6_request_sock IPv6 specific structure. Following inet_request_sock fields were renamed as they became macros to reference fields from struct sock_common. Prefix ir_ was chosen to avoid name collisions. loc_port -> ir_loc_port loc_addr -> ir_loc_addr rmt_addr -> ir_rmt_addr rmt_port -> ir_rmt_port iif -> ir_iif Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 10月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
TCP listener refactoring, part 4 : To speed up inet lookups, we moved IPv4 addresses from inet to struct sock_common Now is time to do the same for IPv6, because it permits us to have fast lookups for all kind of sockets, including upcoming SYN_RECV. Getting IPv6 addresses in TCP lookups currently requires two extra cache lines, plus a dereference (and memory stall). inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6 This patch is way bigger than its IPv4 counter part, because for IPv4, we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6, it's not doable easily. inet6_sk(sk)->daddr becomes sk->sk_v6_daddr inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr at the same offset. We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic macro. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 9月, 2013 1 次提交
-
-
由 Duan Jiong 提交于
DCCP shouldn't be setting sk_err on redirects as it isn't an error condition. it should be doing exactly what tcp is doing and leaving the error handler without touching the socket. Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 3月, 2013 1 次提交
-
-
由 Christoph Paasch 提交于
TCPCT uses option-number 253, reserved for experimental use and should not be used in production environments. Further, TCPCT does not fully implement RFC 6013. As a nice side-effect, removing TCPCT increases TCP's performance for very short flows: Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests for files of 1KB size. before this patch: average (among 7 runs) of 20845.5 Requests/Second after: average (among 7 runs) of 21403.6 Requests/Second Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-