- 12 3月, 2012 1 次提交
-
-
由 Joe Perches 提交于
Use a more current kernel messaging style. Convert a printk block to print_hex_dump. Coalesce formats, align arguments. Use %s, __func__ instead of embedding function names. Some messages that were prefixed with <foo>_close are now prefixed with <foo>_fini. Some ah4 and esp messages are now not prefixed with "ip ". The intent of this patch is to later add something like #define pr_fmt(fmt) "IPv4: " fmt. to standardize the output messages. Text size is trivially reduced. (x86-32 allyesconfig) $ size net/ipv4/built-in.o* text data bss dec hex filename 887888 31558 249696 1169142 11d6f6 net/ipv4/built-in.o.new 887934 31558 249800 1169292 11d78c net/ipv4/built-in.o.old Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 3月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
commit a8afca03 (tcp: md5: protects md5sig_info with RCU) added a lockdep splat in tcp_md5_do_lookup() in case a timer fires a tcp retransmit. At this point, socket lock is owned by the sofirq handler, not the user, so we should adjust a bit the lockdep condition, as we dont hold rcu_read_lock(). Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Reported-by: NValdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 2月, 2012 1 次提交
-
-
由 Jiri Benc 提交于
Currently, it is not easily possible to get TOS/DSCP value of packets from an incoming TCP stream. The mechanism is there, IP_PKTOPTIONS getsockopt with IP_RECVTOS set, the same way as incoming TTL can be queried. This is not actually implemented for TOS, though. This patch adds this functionality, both for IPv4 (IP_PKTOPTIONS) and IPv6 (IPV6_2292PKTOPTIONS). For IPv4, like in the IP_RECVTTL case, the value of the TOS field is stored from the other party's ACK. This is needed for proxies which require DSCP transparency. One such example is at http://zph.bratcheda.org/. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 2月, 2012 1 次提交
-
-
由 Shawn Lu 提交于
Binding RST packet outgoing interface to incoming interface for tcp v4 when there is no socket associate with it. when sk is not NULL, using sk->sk_bound_dev_if instead. (suggested by Eric Dumazet). This has few benefits: 1. tcp_v6_send_reset already did that. 2. This helps tcp connect with SO_BINDTODEVICE set. When connection is lost, we still able to sending out RST using same interface. 3. we are sending reply, it is most likely to be succeed if iif is used Signed-off-by: NShawn Lu <shawn.lu@ericsson.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 2月, 2012 1 次提交
-
-
由 Shawn Lu 提交于
TCP RST mechanism is broken in TCP md5(RFC2385). When connection is gone, md5 key is lost, sending RST without md5 hash is deem to ignored by peer. This can be a problem since RST help protocal like bgp to fast recove from peer crash. In most case, users of tcp md5, such as bgp and ldp, have listener on both sides to accept connection from peer. md5 keys for peers are saved in listening socket. There are two cases in finding md5 key when connection is lost: 1.Passive receive RST: The message is send to well known port, tcp will associate it with listner. md5 key is gotten from listener. 2.Active receive RST (no sock): The message is send to ative side, there is no socket associated with the message. In this case, finding listener from source port, then find md5 key from listener. we are not loosing sercuriy here: packet is checked with md5 hash. No RST is generated if md5 hash doesn't match or no md5 key can be found. Signed-off-by: NShawn Lu <shawn.lu@ericsson.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 2月, 2012 4 次提交
-
-
由 Eric Dumazet 提交于
This patch makes sure we use appropriate memory barriers before publishing tp->md5sig_info, allowing tcp_md5_do_lookup() being used from tcp_v4_send_reset() without holding socket lock (upcoming patch from Shawn Lu) Note we also need to respect rcu grace period before its freeing, since we can free socket without this grace period thanks to SLAB_DESTROY_BY_RCU Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Shawn Lu <shawn.lu@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
There is no limit on number of MD5 keys an application can attach to a tcp socket. This patch adds a per tcp socket limit based on /proc/sys/net/core/optmem_max With current default optmem_max values, this allows about 150 keys on 64bit arches, and 88 keys on 32bit arches. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
In order to be able to support proper RST messages for TCP MD5 flows, we need to allow access to MD5 keys without locking listener socket. This conversion is a nice cleanup, and shrinks size of timewait sockets by 80 bytes. IPv6 code reuses generic code found in IPv4 instead of duplicating it. Control path uses GFP_KERNEL allocations instead of GFP_ATOMIC. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Shawn Lu <shawn.lu@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We no longer use md5_add() method from struct tcp_sock_af_ops Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 1月, 2012 1 次提交
-
-
由 shawnlu 提交于
md5 key is added in socket through remote address. remote address should be used in finding md5 key when sending out reset packet. Signed-off-by: Nshawnlu <shawn.lu@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 12月, 2011 3 次提交
-
-
由 Glauber Costa 提交于
This patch allows each namespace to independently set up its levels for tcp memory pressure thresholds. This patch alone does not buy much: we need to make this values per group of process somehow. This is achieved in the patches that follows in this patchset. Signed-off-by: NGlauber Costa <glommer@parallels.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Glauber Costa 提交于
This patch introduces memory pressure controls for the tcp protocol. It uses the generic socket memory pressure code introduced in earlier patches, and fills in the necessary data in cg_proto struct. Signed-off-by: NGlauber Costa <glommer@parallels.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Glauber Costa 提交于
This patch replaces all uses of struct sock fields' memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem to acessor macros. Those macros can either receive a socket argument, or a mem_cgroup argument, depending on the context they live in. Since we're only doing a macro wrapping here, no performance impact at all is expected in the case where we don't have cgroups disabled. Signed-off-by: NGlauber Costa <glommer@parallels.com> Reviewed-by: NHiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 12月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
Rick Jones reported that TCP_CONGESTION sockopt performed on a listener was ignored for its children sockets : right after accept() the congestion control for new socket is the system default one. This seems an oversight of the initial design (quoted from Stephen) Based on prior investigation and patch from Rick. Reported-by: NRick Jones <rick.jones2@hp.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Yuchung Cheng <ycheng@google.com> Tested-by: NRick Jones <rick.jones2@hp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 11月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
Simon Kirby reported divides by zero errors in __tcp_select_window() This happens when inet_csk_route_child_sock() returns a NULL pointer : We free new socket while we eventually armed keepalive timer in tcp_create_openreq_child() Fix this by a call to tcp_clear_xmit_timers() [ This is a followup to commit 918eb399 (net: add missing bh_unlock_sock() calls) ] Reported-by: NSimon Kirby <sim@hostway.ca> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Tested-by: NSimon Kirby <sim@hostway.ca> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 11月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
Simon Kirby reported lockdep warnings and following messages : [104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102? [104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102? Problem comes from commit 0e734419 (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.) If inet_csk_route_child_sock() returns NULL, we should release socket lock before freeing it. Another lock imbalance exists if __inet_inherit_port() returns an error since commit 093d2823 ( tproxy: fix hash locking issue when using port redirection in __inet_inherit_port()) a backport is also needed for >= 2.6.37 kernels. Reported-by: NSimon Kirby <sim@hostway.ca> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Tested-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Balazs Scheidler <bazsi@balabit.hu> CC: KOVACS Krisztian <hidden@balabit.hu> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NSimon Kirby <sim@hostway.ca> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 11月, 2011 1 次提交
-
-
由 Arjan van de Ven 提交于
the tcp and udp code creates a set of struct file_operations at runtime while it can also be done at compile time, with the added benefit of then having these file operations be const. the trickiest part was to get the "THIS_MODULE" reference right; the naive method of declaring a struct in the place of registration would not work for this reason. Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 10月, 2011 2 次提交
-
-
由 Eric Dumazet 提交于
There is a long standing bug in linux tcp stack, about ACK messages sent on behalf of TIME_WAIT sockets. In the IP header of the ACK message, we choose to reflect TOS field of incoming message, and this might break some setups. Example of things that were broken : - Routing using TOS as a selector - Firewalls - Trafic classification / shaping We now remember in timewait structure the inet tos field and use it in ACK generation, and route lookup. Notes : - We still reflect incoming TOS in RST messages. - We could extend MuraliRaja Muniraju patch to report TOS value in netlink messages for TIME_WAIT sockets. - A patch is needed for IPv6 Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Now tcp_md5_hash_header() has a const tcphdr argument, we can add more const attributes to callers. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 10月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
Adding const qualifiers to pointers can ease code review, and spot some bugs. It might allow compiler to optimize code further. For example, is it legal to temporary write a null cksum into tcphdr in tcp_md5_hash_header() ? I am afraid a sniffer could catch the temporary null value... Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 10月, 2011 1 次提交
-
-
由 Yan, Zheng 提交于
tcp_v4_clear_md5_list() assumes that multiple tcp md5sig peers only hold one reference to md5sig_pool. but tcp_v4_md5_do_add() increases use count of md5sig_pool for each peer. This patch makes tcp_v4_md5_do_add() only increases use count for the first tcp md5sig peer. Signed-off-by: NZheng Yan <zheng.z.yan@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 9月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
struct tcp_skb_cb contains a "flags" field containing either tcp flags or IP dsfield depending on context (input or output path) Introduce ip_dsfield to make the difference clear and ease maintenance. If later we want to save space, we can union flags/ip_dsfield Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 9月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
"Possible SYN flooding on port xxxx " messages can fill logs on servers. Change logic to log the message only once per listener, and add two new SNMP counters to track : TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client TCPReqQFullDrop : number of times a SYN request was dropped because syncookies were not enabled. Based on a prior patch from Tom Herbert, and suggestions from David. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 8月, 2011 1 次提交
-
-
由 Tom Herbert 提交于
The l4_rxhash flag was added to the skb structure to indicate that the rxhash value was computed over the 4 tuple for the packet which includes the port information in the encapsulated transport packet. This is used by the stack to preserve the rxhash value in __skb_rx_tunnel. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 8月, 2011 1 次提交
-
-
由 David S. Miller 提交于
Computers have become a lot faster since we compromised on the partial MD4 hash which we use currently for performance reasons. MD5 is a much safer choice, and is inline with both RFC1948 and other ISS generators (OpenBSD, Solaris, etc.) Furthermore, only having 24-bits of the sequence number be truly unpredictable is a very serious limitation. So the periodic regeneration and 8-bit counter have been removed. We compute and use a full 32-bit sequence number. For ipv6, DCCP was found to use a 32-bit truncated initial sequence number (it needs 43-bits) and that is fixed here as well. Reported-by: NDan Kaminsky <dan@doxpara.com> Tested-by: NWilly Tarreau <w@1wt.eu> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 6月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
Le jeudi 16 juin 2011 à 23:38 -0400, David Miller a écrit : > From: Ben Hutchings <bhutchings@solarflare.com> > Date: Fri, 17 Jun 2011 00:50:46 +0100 > > > On Wed, 2011-06-15 at 04:15 +0200, Eric Dumazet wrote: > >> @@ -1594,6 +1594,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb) > >> goto discard; > >> > >> if (nsk != sk) { > >> + sock_rps_save_rxhash(nsk, skb->rxhash); > >> if (tcp_child_process(sk, nsk, skb)) { > >> rsk = nsk; > >> goto reset; > >> > > > > I haven't tried this, but it looks reasonable to me. > > > > What about IPv6? The logic in tcp_v6_do_rcv() looks very similar. > > Indeed ipv6 side needs the same fix. > > Eric please add that part and resubmit. And in fact I might stick > this into net-2.6 instead of net-next-2.6 > OK, here is the net-2.6 based one then, thanks ! [PATCH v2] net: rfs: enable RFS before first data packet is received First packet received on a passive tcp flow is not correctly RFS steered. One sock_rps_record_flow() call is missing in inet_accept() But before that, we also must record rxhash when child socket is setup. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: NDavid S. Miller <davem@conan.davemloft.net>
-
- 09 6月, 2011 1 次提交
-
-
由 Jerry Chu 提交于
This patch lowers the default initRTO from 3secs to 1sec per RFC2988bis. It falls back to 3secs if the SYN or SYN-ACK packet has been retransmitted, AND the TCP timestamp option is not on. It also adds support to take RTT sample during 3WHS on the passive open side, just like its active open counterpart, and uses it, if valid, to seed the initRTO for the data transmission phase. The patch also resets ssthresh to its initial default at the beginning of the data transmission phase, and reduces cwnd to 1 if there has been MORE THAN ONE retransmission during 3WHS per RFC5681. Signed-off-by: NH.K. Jerry Chu <hkchu@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 5月, 2011 1 次提交
-
-
由 Dan Rosenberg 提交于
The %pK format specifier is designed to hide exposed kernel pointers, specifically via /proc interfaces. Exposing these pointers provides an easy target for kernel write vulnerabilities, since they reveal the locations of writable structures containing easily triggerable function pointers. The behavior of %pK depends on the kptr_restrict sysctl. If kptr_restrict is set to 0, no deviation from the standard %p behavior occurs. If kptr_restrict is set to 1, the default, if the current user (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are printed as 0's. If kptr_restrict is set to 2, kernel pointers using %pK are printed as 0's regardless of privileges. Replacing with 0's was chosen over the default "(null)", which cannot be parsed by userland %p, which expects "(nil)". The supporting code for kptr_restrict and %pK are currently in the -mm tree. This patch converts users of %p in net/ to %pK. Cases of printing pointers to the syslog are not covered, since this would eliminate useful information for postmortem debugging and the reading of the syslog is already optionally protected by the dmesg_restrict sysctl. Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com> Cc: James Morris <jmorris@namei.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Thomas Graf <tgraf@infradead.org> Cc: Eugene Teo <eugeneteo@kernel.org> Cc: Kees Cook <kees.cook@canonical.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 5月, 2011 3 次提交
-
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This will next trickle down to rt_bind_peer(). Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This way the caller can get at the fully resolved fl4->{daddr,saddr} etc. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 5月, 2011 1 次提交
-
-
由 David S. Miller 提交于
This eliminates an access to rt->rt_src. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 5月, 2011 3 次提交
-
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Operation order is now transposed, we first create the child socket then we try to hook up the route. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Since this is invoked from inet_stream_connect() the socket is locked and therefore this usage is safe. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 4月, 2011 3 次提交
-
-
由 David S. Miller 提交于
Now that output route lookups update the flow with destination address selection, we can fetch it from fl4->daddr instead of rt->rt_dst Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Now that output route lookups update the flow with source address selection, we can fetch it from fl4->saddr instead of rt->rt_src Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We lack proper synchronization to manipulate inet->opt ip_options Problem is ip_make_skb() calls ip_setup_cork() and ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options), without any protection against another thread manipulating inet->opt. Another thread can change inet->opt pointer and free old one under us. Use RCU to protect inet->opt (changed to inet->inet_opt). Instead of handling atomic refcounts, just copy ip_options when necessary, to avoid cache line dirtying. We cant insert an rcu_head in struct ip_options since its included in skb->cb[], so this patch is large because I had to introduce a new ip_options_rcu structure. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 4月, 2011 1 次提交
-
-
由 David S. Miller 提交于
These functions are used together as a unit for route resolution during connect(). They address the chicken-and-egg problem that exists when ports need to be allocated during connect() processing, yet such port allocations require addressing information from the routing code. It's currently more heavy handed than it needs to be, and in particular we allocate and initialize a flow object twice. Let the callers provide the on-stack flow object. That way we only need to initialize it once in the ip_route_connect() call. Later, if ip_route_newports() needs to do anything, it re-uses that flow object as-is except for the ports which it updates before the route re-lookup. Also, describe why this set of facilities are needed and how it works in a big comment. Signed-off-by: NDavid S. Miller <davem@davemloft.net> Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>
-
- 23 4月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-