- 10 10月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
TCP listener refactoring, part 5 : We want to be able to insert request sockets (SYN_RECV) into main ehash table instead of the per listener hash table to allow RCU lookups and remove listener lock contention. This patch includes the needed struct sock_common in front of struct request_sock This means there is no more inet6_request_sock IPv6 specific structure. Following inet_request_sock fields were renamed as they became macros to reference fields from struct sock_common. Prefix ir_ was chosen to avoid name collisions. loc_port -> ir_loc_port loc_addr -> ir_loc_addr rmt_addr -> ir_rmt_addr rmt_port -> ir_rmt_port iif -> ir_iif Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 10月, 2013 2 次提交
-
-
由 Eric Dumazet 提交于
TCP listener refactoring, part 4 : To speed up inet lookups, we moved IPv4 addresses from inet to struct sock_common Now is time to do the same for IPv6, because it permits us to have fast lookups for all kind of sockets, including upcoming SYN_RECV. Getting IPv6 addresses in TCP lookups currently requires two extra cache lines, plus a dereference (and memory stall). inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6 This patch is way bigger than its IPv4 counter part, because for IPv4, we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6, it's not doable easily. inet6_sk(sk)->daddr becomes sk->sk_v6_daddr inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr at the same offset. We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic macro. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
TCP listener refactoring, part 3 : Our goal is to hash SYN_RECV sockets into main ehash for fast lookup, and parallel SYN processing. Current inet_ehash_bucket contains two chains, one for ESTABLISH (and friend states) sockets, another for TIME_WAIT sockets only. As the hash table is sized to get at most one socket per bucket, it makes little sense to have separate twchain, as it makes the lookup slightly more complicated, and doubles hash table memory usage. If we make sure all socket types have the lookup keys at the same offsets, we can use a generic and faster lookup. It turns out TIME_WAIT and ESTABLISHED sockets already have common lookup fields for IPv4. [ INET_TW_MATCH() is no longer needed ] I'll provide a follow-up to factorize IPv6 lookup as well, to remove INET6_TW_MATCH() This way, SYN_RECV pseudo sockets will be supported the same. A new sock_gen_put() helper is added, doing either a sock_put() or inet_twsk_put() [ and will support SYN_RECV later ]. Note this helper should only be called in real slow path, when rcu lookup found a socket that was moved to another identity (freed/reused immediately), but could eventually be used in other contexts, like sock_edemux() Before patch : dmesg | grep "TCP established" TCP established hash table entries: 524288 (order: 11, 8388608 bytes) After patch : TCP established hash table entries: 524288 (order: 10, 4194304 bytes) Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 9月, 2013 1 次提交
-
-
由 Duan Jiong 提交于
DCCP shouldn't be setting sk_err on redirects as it isn't an error condition. it should be doing exactly what tcp is doing and leaving the error handler without touching the socket. Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 7月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
Several call sites use the hardcoded following condition : sk_stream_wspace(sk) >= sk_stream_min_wspace(sk) Lets use a helper because TCP_NOTSENT_LOWAT support will change this condition for TCP sockets. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 3月, 2013 1 次提交
-
-
由 Christoph Paasch 提交于
TCPCT uses option-number 253, reserved for experimental use and should not be used in production environments. Further, TCPCT does not fully implement RFC 6013. As a nice side-effect, removing TCPCT increases TCP's performance for very short flows: Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests for files of 1KB size. before this patch: average (among 7 runs) of 20845.5 Requests/Second after: average (among 7 runs) of 21403.6 Requests/Second Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 2月, 2013 2 次提交
-
-
由 Gao feng 提交于
proc_net_remove is only used to remove proc entries that under /proc/net,it's not a general function for removing proc entries of netns. if we want to remove some proc entries which under /proc/net/stat/, we still need to call remove_proc_entry. this patch use remove_proc_entry to replace proc_net_remove. we can remove proc_net_remove after this patch. Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gao feng 提交于
Right now, some modules such as bonding use proc_create to create proc entries under /proc/net/, and other modules such as ipv4 use proc_net_fops_create. It looks a little chaos.this patch changes all of proc_net_fops_create to proc_create. we can remove proc_net_fops_create after this patch. Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 1月, 2013 2 次提交
-
-
由 Kees Cook 提交于
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. CC: Gerrit Renker <gerrit@erg.abdn.ac.uk> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: NKees Cook <keescook@chromium.org> Acked-by: NDavid S. Miller <davem@davemloft.net> Acked-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
-
由 Kees Cook 提交于
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. CC: Gerrit Renker <gerrit@erg.abdn.ac.uk> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: NKees Cook <keescook@chromium.org> Acked-by: NDavid S. Miller <davem@davemloft.net> Acked-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
-
- 15 12月, 2012 1 次提交
-
-
由 Christoph Paasch 提交于
If in either of the above functions inet_csk_route_child_sock() or __inet_inherit_port() fails, the newsk will not be freed: unreferenced object 0xffff88022e8a92c0 (size 1592): comm "softirq", pid 0, jiffies 4294946244 (age 726.160s) hex dump (first 32 bytes): 0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00 ................ 02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8153d190>] kmemleak_alloc+0x21/0x3e [<ffffffff810ab3e7>] kmem_cache_alloc+0xb5/0xc5 [<ffffffff8149b65b>] sk_prot_alloc.isra.53+0x2b/0xcd [<ffffffff8149b784>] sk_clone_lock+0x16/0x21e [<ffffffff814d711a>] inet_csk_clone_lock+0x10/0x7b [<ffffffff814ebbc3>] tcp_create_openreq_child+0x21/0x481 [<ffffffff814e8fa5>] tcp_v4_syn_recv_sock+0x3a/0x23b [<ffffffff814ec5ba>] tcp_check_req+0x29f/0x416 [<ffffffff814e8e10>] tcp_v4_do_rcv+0x161/0x2bc [<ffffffff814eb917>] tcp_v4_rcv+0x6c9/0x701 [<ffffffff814cea9f>] ip_local_deliver_finish+0x70/0xc4 [<ffffffff814cec20>] ip_local_deliver+0x4e/0x7f [<ffffffff814ce9f8>] ip_rcv_finish+0x1fc/0x233 [<ffffffff814cee68>] ip_rcv+0x217/0x267 [<ffffffff814a7bbe>] __netif_receive_skb+0x49e/0x553 [<ffffffff814a7cc3>] netif_receive_skb+0x50/0x82 This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus a single sock_put() is not enough to free the memory. Additionally, things like xfrm, memcg, cookie_values,... may have been initialized. We have to free them properly. This is fixed by forcing a call to tcp_done(), ending up in inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary, because it ends up doing all the cleanup on xfrm, memcg, cookie_values, xfrm,... Before calling tcp_done, we have to set the socket to SOCK_DEAD, to force it entering inet_csk_destroy_sock. To avoid the warning in inet_csk_destroy_sock, inet_num has to be set to 0. As inet_csk_destroy_sock does a dec on orphan_count, we first have to increase it. Calling tcp_done() allows us to remove the calls to tcp_clear_xmit_timer() and tcp_cleanup_congestion_control(). A similar approach is taken for dccp by calling dccp_done(). This is in the kernel since 093d2823 (tproxy: fix hash locking issue when using port redirection in __inet_inherit_port()), thus since version >= 2.6.37. Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 11月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
For passive TCP connections using TCP_DEFER_ACCEPT facility, we incorrectly increment req->retrans each time timeout triggers while no SYNACK is sent. SYNACK are not sent for TCP_DEFER_ACCEPT that were established (for which we received the ACK from client). Only the last SYNACK is sent so that we can receive again an ACK from client, to move the req into accept queue. We plan to change this later to avoid the useless retransmit (and potential problem as this SYNACK could be lost) TCP_INFO later gives wrong information to user, claiming imaginary retransmits. Decouple req->retrans field into two independent fields : num_retrans : number of retransmit num_timeout : number of timeouts num_timeout is the counter that is incremented at each timeout, regardless of actual SYNACK being sent or not, and used to compute the exponential timeout. Introduce inet_rtx_syn_ack() helper to increment num_retrans only if ->rtx_syn_ack() succeeded. Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans when we re-send a SYNACK in answer to a (retransmitted) SYN. Prior to this patch, we were not counting these retransmits. Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS only if a synack packet was successfully queued. Reported-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Cc: Vijay Subramanian <subramanian.vijay@gmail.com> Cc: Elliott Hughes <enh@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 8月, 2012 2 次提交
-
-
由 Mathias Krause 提交于
The CCID3 code fails to initialize the trailing padding bytes of struct tfrc_tx_info added for alignment on 64 bit architectures. It that for potentially leaks four bytes kernel stack via the getsockopt() syscall. Add an explicit memset(0) before filling the structure to avoid the info leak. Signed-off-by: NMathias Krause <minipli@googlemail.com> Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Mathias Krause 提交于
ccid_hc_rx_getsockopt() and ccid_hc_tx_getsockopt() might be called with a NULL ccid pointer leading to a NULL pointer dereference. This could lead to a privilege escalation if the attacker is able to map page 0 and prepare it with a fake ccid_ops pointer. Signed-off-by: NMathias Krause <minipli@googlemail.com> Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk> Cc: stable@vger.kernel.org Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 7月, 2012 1 次提交
-
-
由 David S. Miller 提交于
Use inet_iif() consistently, and for TCP record the input interface of cached RX dst in inet sock. rt->rt_iif is going to be encoded differently, so that we can legitimately cache input routes in the FIB info more aggressively. When the input interface is "use SKB device index" the rt->rt_iif will be set to zero. This forces us to move the TCP RX dst cache installation into the ipv4 specific code, and as well it should since doing the route caching for ipv6 is pointless at the moment since it is not inspected in the ipv6 input paths yet. Also, remove the unlikely on dst->obsolete, all ipv4 dsts have obsolete set to a non-zero value to force invocation of the check callback. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 7月, 2012 1 次提交
-
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 7月, 2012 1 次提交
-
-
由 David S. Miller 提交于
This will be used so that we can compose a full flow key. Even though we have a route in this context, we need more. In the future the routes will be without destination address, source address, etc. keying. One ipv4 route will cover entire subnets, etc. In this environment we have to have a way to possess persistent storage for redirects and PMTU information. This persistent storage will exist in the FIB tables, and that's why we'll need to be able to rebuild a full lookup flow key here. Using that flow key will do a fib_lookup() and create/update the persistent entry. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 7月, 2012 2 次提交
-
-
由 David S. Miller 提交于
This is the ipv6 version of inet_csk_update_pmtu(). Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This abstracts away the call to dst_ops->update_pmtu() so that we can transparently handle the fact that, in the future, the dst itself can be invalidated by the PMTU update (when we have non-host routes cached in sockets). So we try to rebuild the socket cached route after the method invocation if necessary. This isn't used by SCTP because it needs to cache dsts per-transport, and thus will need it's own local version of this helper. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 7月, 2012 3 次提交
-
-
由 David S. Miller 提交于
No longer necessary. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 7月, 2012 1 次提交
-
-
由 Ben Hutchings 提交于
Fix incorrect start markers, wrapped summary lines, missing section breaks, incorrect separators, and some name mismatches. Signed-off-by: NBen Hutchings <bhutchings@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 7月, 2012 1 次提交
-
-
由 RongQing.Li 提交于
opt always equals np->opts, so it is meaningless to define opt, and check if opt does not equal np->opts and then try to free opt. Signed-off-by: NRongQing.Li <roy.qing.li@gmail.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 6月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
Don't cache output dst for syncookies, as this adds pressure on IP route cache and rcu subsystem for no gain. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 6月, 2012 1 次提交
-
-
由 David S. Miller 提交于
One tricky issue on the ipv6 side vs. ipv4 is that the ICMP callouts to handle the error pass the 32-bit info cookie in network byte order whereas ipv4 passes it around in host byte order. Like the ipv4 side, we have two helper functions. One for when we have a socket context and one for when we do not. ip6ip6 tunnels are not handled here, because they handle PMTU events by essentially relaying another ICMP packet-too-big message back to the original sender. This patch allows us to get rid of rt6_do_pmtu_disc(). It handles all kinds of situations that simply cannot happen when we do the PMTU update directly using a fully resolved route. In fact, the "plen == 128" check in ip6_rt_update_pmtu() can very likely be removed or changed into a BUG_ON() check. We should never have a prefixed ipv6 route when we get there. Another piece of strange history here is that TCP and DCCP, unlike in ipv4, never invoke the update_pmtu() method from their ICMP error handlers. This is incredibly astonishing since this is the context where we have the most accurate context in which to make a PMTU update, namely we have a fully connected socket and associated cached socket route. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 5月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
bool/const conversions where possible __inline__ -> inline space cleanups Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 4月, 2012 2 次提交
-
-
由 Eric W. Biederman 提交于
This results in code with less boiler plate that is a bit easier to read. Additionally stops us from using compatibility code in the sysctl core, hastening the day when the compatibility code can be removed. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Acked-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
This makes it clearer which sysctls are relative to your current network namespace. This makes it a little less error prone by not exposing sysctls for the initial network namespace in other namespaces. This is the same way we handle all of our other network interfaces to userspace and I can't honestly remember why we didn't do this for sysctls right from the start. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Acked-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 4月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
When we need to clone skb, we dont drop a packet. Call consume_skb() to not confuse dropwatch. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 4月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 4月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
There are two struct request_sock_ops providers, tcp and dccp. inet_csk_reqsk_queue_prune() can avoid testing syn_ack_timeout being NULL if we make it non NULL like syn_ack_timeout Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk> Cc: dccp@vger.kernel.org Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 3月, 2012 2 次提交
-
-
由 Samuel Jero 提交于
This fixes a bug in the sequence number validation during the initial handshake. The code did not treat the initial sequence numbers ISS and ISR as read-only and did not keep state for GSR and GSS as required by the specification. This causes problems with retransmissions during the initial handshake, causing the budding connection to be reset. This patch now treats ISS/ISR as read-only and tracks GSS/GSR as required. Signed-off-by: NSamuel Jero <sj323707@ohio.edu> Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
-
由 Gerrit Renker 提交于
This replaces an unjustified BUG_ON(), which could get triggered under normal conditions: X_calc can be 0 when p > 0. X would in this case be set to the minimum, s/t_mbi. Its replacement avoids t_ipi = 0 (unbounded sending rate). Thanks to Jordi, Victor and Xavier who reported this. Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: NIan McDonald <ian.mcdonald@jandi.co.uk>
-
- 12 1月, 2012 1 次提交
-
-
由 Pavel Emelyanov 提交于
Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 12月, 2011 2 次提交
-
-
由 Rusty Russell 提交于
module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. (Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false). Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rusty Russell 提交于
DaveM said: Please, this kind of stuff rots forever and not using bool properly drives me crazy. Joe Perches <joe@perches.com> gave me the spatch script: @@ bool b; @@ -b = 0 +b = false @@ bool b; @@ -b = 1 +b = true I merely installed coccinelle, read the documentation and took credit. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 12月, 2011 1 次提交
-
-
由 Pavel Emelyanov 提交于
I've made a mistake when fixing the sock_/inet_diag aliases :( 1. The sock_diag layer should request the family-based alias, not just the IPPROTO_IP one; 2. The inet_diag layer should request for AF_INET+protocol alias, not just the protocol one. Thus fix this. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 12月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 12月, 2011 1 次提交
-
-
由 Pavel Emelyanov 提交于
Introduce two callbacks in inet_diag_handler -- one for dumping all sockets (with filters) and the other one for dumping a single sk. Replace direct calls to icsk handlers with indirect calls to callbacks provided by handlers. Make existing TCP and DCCP handlers use provided helpers for icsk-s. The UDP diag module will provide its own. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-