- 22 6月, 2011 3 次提交
-
-
由 Paul Gortmaker 提交于
There are enough instances of this: iph->frag_off & htons(IP_MF | IP_OFFSET) that a helper function is probably warranted. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Satoru Moriya 提交于
This patch adds a tracepoint to __udp_queue_rcv_skb to get the return value of ip_queue_rcv_skb. It indicates why kernel drops a packet at this point. ip_queue_rcv_skb returns following values in the packet drop case: rcvbuf is full : -ENOMEM sk_filter returns error : -EINVAL, -EACCESS, -ENOMEM, etc. __sk_mem_schedule returns error: -ENOBUF Signed-off-by: NSatoru Moriya <satoru.moriya@hds.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Juhl 提交于
It was suggested by "make versioncheck" that the follwing includes of linux/version.h are redundant: /home/jj/src/linux-2.6/net/caif/caif_dev.c: 14 linux/version.h not needed. /home/jj/src/linux-2.6/net/caif/chnl_net.c: 10 linux/version.h not needed. /home/jj/src/linux-2.6/net/ipv4/gre.c: 19 linux/version.h not needed. /home/jj/src/linux-2.6/net/netfilter/ipset/ip_set_core.c: 20 linux/version.h not needed. /home/jj/src/linux-2.6/net/netfilter/xt_set.c: 16 linux/version.h not needed. and it seems that it is right. Beyond manually inspecting the source files I also did a few build tests with various configs to confirm that including the header in those files is indeed not needed. Here's a patch to remove the pointless includes. Signed-off-by: NJesper Juhl <jj@chaosbits.net> Acked-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 6月, 2011 1 次提交
-
-
由 Jesper Juhl 提交于
Remove the duplicate inclusion of net/icmp.h from net/ipv4/ping.c Signed-off-by: NJesper Juhl <jj@chaosbits.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 6月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
Knut Tidemann found that first packet of a multicast flow was not correctly received, and bisected the regression to commit b23dd4fe (Make output route lookup return rtable directly.) Special thanks to Knut, who provided a very nice bug report, including sample programs to demonstrate the bug. Reported-and-bisectedby: Knut Tidemann <knut.andre.tidemann@jotron.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 6月, 2011 2 次提交
-
-
由 Eric Dumazet 提交于
A malicious user or buggy application can inject code and trigger an infinite loop in inet_diag_bc_audit() Also make sure each instruction is aligned on 4 bytes boundary, to avoid unaligned accesses. Reported-by: NDan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Le jeudi 16 juin 2011 à 23:38 -0400, David Miller a écrit : > From: Ben Hutchings <bhutchings@solarflare.com> > Date: Fri, 17 Jun 2011 00:50:46 +0100 > > > On Wed, 2011-06-15 at 04:15 +0200, Eric Dumazet wrote: > >> @@ -1594,6 +1594,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb) > >> goto discard; > >> > >> if (nsk != sk) { > >> + sock_rps_save_rxhash(nsk, skb->rxhash); > >> if (tcp_child_process(sk, nsk, skb)) { > >> rsk = nsk; > >> goto reset; > >> > > > > I haven't tried this, but it looks reasonable to me. > > > > What about IPv6? The logic in tcp_v6_do_rcv() looks very similar. > > Indeed ipv6 side needs the same fix. > > Eric please add that part and resubmit. And in fact I might stick > this into net-2.6 instead of net-next-2.6 > OK, here is the net-2.6 based one then, thanks ! [PATCH v2] net: rfs: enable RFS before first data packet is received First packet received on a passive tcp flow is not correctly RFS steered. One sock_rps_record_flow() call is missing in inet_accept() But before that, we also must record rxhash when child socket is setup. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: NDavid S. Miller <davem@conan.davemloft.net>
-
- 16 6月, 2011 5 次提交
-
-
由 Julian Anastasov 提交于
Avoid double seq adjustment for loopback traffic because it causes silent repetition of TCP data. One example is passive FTP with DNAT rule and difference in the length of IP addresses. This patch adds check if packet is sent and received via loopback device. As the same conntrack is used both for outgoing and incoming direction, we restrict seq adjustment to happen only in POSTROUTING. Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NPatrick McHardy <kaber@trash.net>
-
由 Nicolas Cavallari 提交于
By default, when broadcast or multicast packet are sent from a local application, they are sent to the interface then looped by the kernel to other local applications, going throught netfilter hooks in the process. These looped packet have their MAC header removed from the skb by the kernel looping code. This confuse various netfilter's netlink queue, netlink log and the legacy ip_queue, because they try to extract a hardware address from these packets, but extracts a part of the IP header instead. This patch prevent NFQUEUE, NFLOG and ip_QUEUE to include a MAC header if there is none in the packet. Signed-off-by: NNicolas Cavallari <cavallar@lri.fr> Signed-off-by: NPatrick McHardy <kaber@trash.net>
-
由 Patrick McHardy 提交于
Userspace allows to specify inversion for IP header ECN matches, the kernel silently accepts it, but doesn't invert the match result. Signed-off-by: NPatrick McHardy <kaber@trash.net>
-
由 Patrick McHardy 提交于
Check for protocol inversion in ecn_mt_check() and remove the unnecessary runtime check for IPPROTO_TCP in ecn_mt(). Signed-off-by: NPatrick McHardy <kaber@trash.net>
-
Signed-off-by: NSebastian Andrzej Siewior <sebastian@breakpoint.cc> Signed-off-by: NPatrick McHardy <kaber@trash.net>
-
- 12 6月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
SNMP mibs use two percpu arrays, one used in BH context, another in USER context. With increasing number of cpus in machines, and fact that ipv6 uses per network device ipstats_mib, this is consuming a lot of memory if many network devices are registered. commit be281e55 (ipv6: reduce per device ICMP mib sizes) shrinked percpu needs for ipv6, but we can reduce memory use a bit more. With recent percpu infrastructure (irqsafe_cpu_inc() ...), we no longer need this BH/USER separation since we can update counters in a single x86 instruction, regardless of the BH/USER context. Other arches than x86 might need to disable irq in their irqsafe_cpu_inc() implementation : If this happens to be a problem, we can make SNMP_ARRAY_SZ arch dependent, but a previous poll ( https://lkml.org/lkml/2011/3/17/174 ) to arch maintainers did not raise strong opposition. Only on 32bit arches, we need to disable BH for 64bit counters updates done from USER context (currently used for IP MIB) This also reduces vmlinux size : 1) x86_64 build $ size vmlinux.before vmlinux.after text data bss dec hex filename 7853650 1293772 1896448 11043870 a8841e vmlinux.before 7850578 1293772 1896448 11040798 a8781e vmlinux.after 2) i386 build $ size vmlinux.before vmlinux.afterpatch text data bss dec hex filename 6039335 635076 3670016 10344427 9dd7eb vmlinux.before 6037342 635076 3670016 10342434 9dd022 vmlinux.afterpatch Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Andi Kleen <andi@firstfloor.org> CC: Ingo Molnar <mingo@elte.hu> CC: Tejun Heo <tj@kernel.org> CC: Christoph Lameter <cl@linux-foundation.org> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org CC: linux-arch@vger.kernel.org Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 6月, 2011 2 次提交
-
-
由 Greg Rose 提交于
The message size allocated for rtnl ifinfo dumps was limited to a single page. This is not enough for additional interface info available with devices that support SR-IOV and caused a bug in which VF info would not be displayed if more than approximately 40 VFs were created per interface. Implement a new function pointer for the rtnl_register service that will calculate the amount of data required for the ifinfo dump and allocate enough data to satisfy the request. Signed-off-by: NGreg Rose <gregory.v.rose@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Steffen Klassert 提交于
We assume that transhdrlen is positive on the first fragment which is wrong for raw packets. So we don't add exthdrlen to the packet size for raw packets. This leads to a reallocation on IPsec because we have not enough headroom on the skb to place the IPsec headers. This patch fixes this by adding exthdrlen to the packet size whenever the send queue of the socket is empty. This issue was introduced with git commit 1470ddf7 (inet: Remove explicit write references to sk/inet in ip_append_data) Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 6月, 2011 3 次提交
-
-
由 Eric Dumazet 提交于
commit 2c8cec5c (ipv4: Cache learned PMTU information in inetpeer) added some racy peer->pmtu_expires accesses. As its value can be changed by another cpu/thread, we should be more careful, reading its value once. Add peer_pmtu_expired() and peer_pmtu_cleaned() helpers Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Andi Kleen and Tim Chen reported huge contention on inetpeer unused_peers.lock, on memcached workload on a 40 core machine, with disabled route cache. It appears we constantly flip peers refcnt between 0 and 1 values, and we must insert/remove peers from unused_peers.list, holding a contended spinlock. Remove this list completely and perform a garbage collection on-the-fly, at lookup time, using the expired nodes we met during the tree traversal. This removes a lot of code, makes locking more standard, and obsoletes two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also removes two pointers in inet_peer structure. There is still a false sharing effect because refcnt is in first cache line of object [were the links and keys used by lookups are located], we might move it at the end of inet_peer structure to let this first cache line mostly read by cpus. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Andi Kleen <andi@firstfloor.org> CC: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jerry Chu 提交于
This patch lowers the default initRTO from 3secs to 1sec per RFC2988bis. It falls back to 3secs if the SYN or SYN-ACK packet has been retransmitted, AND the TCP timestamp option is not on. It also adds support to take RTT sample during 3WHS on the passive open side, just like its active open counterpart, and uses it, if valid, to seed the initRTO for the data transmission phase. The patch also resets ssthresh to its initial default at the beginning of the data transmission phase, and reduces cwnd to 1 if there has been MORE THAN ONE retransmission during 3WHS per RFC5681. Signed-off-by: NH.K. Jerry Chu <hkchu@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 6月, 2011 4 次提交
-
-
由 Dave Jones 提交于
Netlink message lengths can't be negative, so use unsigned variables. Signed-off-by: NDave Jones <davej@redhat.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
This patch fixes a refcount leak of ct objects that may occur if l4proto->error() assigns one conntrack object to one skbuff. In that case, we have to skip further processing in nf_conntrack_in(). With this patch, we can also fix wrong return values (-NF_ACCEPT) for special cases in ICMP[v6] that should not bump the invalid/error statistic counters. Reported-by: NZoltan Menyhart <Zoltan.Menyhart@bull.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Julian Anastasov 提交于
Fix crash in nf_nat_csum when mangling packets in OUTPUT hook where skb->dev is not defined, it is set later before POSTROUTING. Problem happens for CHECKSUM_NONE. We can check device from rt but using CHECKSUM_PARTIAL should be safe (skb_checksum_help). Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Eric Dumazet 提交于
Following error is raised (and other similar ones) : net/ipv4/netfilter/nf_nat_standalone.c: In function ‘nf_nat_fn’: net/ipv4/netfilter/nf_nat_standalone.c:119:2: warning: case value ‘4’ not in enumerated type ‘enum ip_conntrack_info’ gcc barfs on adding two enum values and getting a not enumerated result : case IP_CT_RELATED+IP_CT_IS_REPLY: Add missing enum values Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: David Miller <davem@davemloft.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 02 6月, 2011 1 次提交
-
-
由 Marcus Meissner 提交于
Check against mistakenly passing in IPv6 addresses (which would result in an INADDR_ANY bind) or similar incompatible sockaddrs. Signed-off-by: NMarcus Meissner <meissner@suse.de> Cc: Reinhard Max <max@suse.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 6月, 2011 1 次提交
-
-
由 Chris Metcalf 提交于
The current code takes an unaligned pointer and does htonl() on it to make it big-endian, then does a memcpy(). The problem is that the compiler decides that since the pointer is to a __be32, it is legal to optimize the copy into a processor word store. However, on an architecture that does not handled unaligned writes in kernel space, this produces an unaligned exception fault. The solution is to track the pointer as a "char *" (which removes a bunch of unpleasant casts in any case), and then just use put_unaligned_be32() to write the value to memory. Signed-off-by: NChris Metcalf <cmetcalf@tilera.com> Signed-off-by: NDavid S. Miller <davem@zippy.davemloft.net>
-
- 28 5月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
Several crashes in cleanup_once() were reported in recent kernels. Commit d6cc1d64 (inetpeer: various changes) added a race in unlink_from_unused(). One way to avoid taking unused_peers.lock before doing the list_empty() test is to catch 0->1 refcnt transitions, using full barrier atomic operations variants (atomic_cmpxchg() and atomic_inc_return()) instead of previous atomic_inc() and atomic_add_unless() variants. We then call unlink_from_unused() only for the owner of the 0->1 transition. Add a new atomic_add_unless_return() static helper With help from Arun Sharma. Refs: https://bugzilla.kernel.org/show_bug.cgi?id=32772Reported-by: NArun Sharma <asharma@fb.com> Reported-by: NMaximilian Engelhardt <maxi@daemonizer.de> Reported-by: NYann Dupont <Yann.Dupont@univ-nantes.fr> Reported-by: NDenys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 5月, 2011 1 次提交
-
-
由 Veaceslav Falico 提交于
In igmp_group_dropped() we call ip_mc_clear_src(), which resets the number of source filters per mulitcast. However, igmp_group_dropped() is also called on NETDEV_DOWN, NETDEV_PRE_TYPE_CHANGE and NETDEV_UNREGISTER, which means that the group might get added back on NETDEV_UP, NETDEV_REGISTER and NETDEV_POST_TYPE_CHANGE respectively, leaving us with broken source filters. To fix that, we must clear the source filters only when there are no users in the ip_mc_list, i.e. in ip_mc_dec_group() and on device destroy. Acked-by: NDavid L Stevens <dlstevens@us.ibm.com> Signed-off-by: NVeaceslav Falico <vfalico@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 5月, 2011 3 次提交
-
-
由 Eric Dumazet 提交于
All static seqlock should be initialized with the lockdep friendly __SEQLOCK_UNLOCKED() macro. Remove legacy SEQLOCK_UNLOCKED() macro. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Link: http://lkml.kernel.org/r/%3C1306238888.3026.31.camel%40edumazet-laptop%3ESigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Dan Rosenberg 提交于
The %pK format specifier is designed to hide exposed kernel pointers, specifically via /proc interfaces. Exposing these pointers provides an easy target for kernel write vulnerabilities, since they reveal the locations of writable structures containing easily triggerable function pointers. The behavior of %pK depends on the kptr_restrict sysctl. If kptr_restrict is set to 0, no deviation from the standard %p behavior occurs. If kptr_restrict is set to 1, the default, if the current user (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are printed as 0's. If kptr_restrict is set to 2, kernel pointers using %pK are printed as 0's regardless of privileges. Replacing with 0's was chosen over the default "(null)", which cannot be parsed by userland %p, which expects "(nil)". The supporting code for kptr_restrict and %pK are currently in the -mm tree. This patch converts users of %p in net/ to %pK. Cases of printing pointers to the syslog are not covered, since this would eliminate useful information for postmortem debugging and the reading of the syslog is already optionally protected by the dmesg_restrict sysctl. Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com> Cc: James Morris <jmorris@namei.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Thomas Graf <tgraf@infradead.org> Cc: Eugene Teo <eugeneteo@kernel.org> Cc: Kees Cook <kees.cook@canonical.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
net/ipv4/ping.c: In function ‘ping_v4_unhash’: net/ipv4/ping.c:140:28: warning: variable ‘hslot’ set but not used Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Vasiliy Kulikov <segoon@openwall.com> Acked-by: NVasiliy Kulikov <segoon@openwall.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 5月, 2011 3 次提交
-
-
由 Paul Gortmaker 提交于
After discovering that wide use of prefetch on modern CPUs could be a net loss instead of a win, net drivers which were relying on the implicit inclusion of prefetch.h via the list headers showed up in the resulting cleanup fallout. Give them an explicit include via the following $0.02 script. ========================================= #!/bin/bash MANUAL="" for i in `git grep -l 'prefetch(.*)' .` ; do grep -q '<linux/prefetch.h>' $i if [ $? = 0 ] ; then continue fi ( echo '?^#include <linux/?a' echo '#include <linux/prefetch.h>' echo . echo w echo q ) | ed -s $i > /dev/null 2>&1 if [ $? != 0 ]; then echo $i needs manual fixup MANUAL="$i $MANUAL" fi done echo ------------------- 8\<---------------------- echo vi $MANUAL ========================================= Signed-off-by: NPaul <paul.gortmaker@windriver.com> [ Fixed up some incorrect #include placements, and added some non-network drivers and the fib_trie.c case - Linus ] Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Jones 提交于
Add a stack backtrace to the ip_rt_bug path for debugging Signed-off-by: NDave Jones <davej@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 5月, 2011 3 次提交
-
-
由 Micha Nelissen 提交于
v3 -> v4: fix return boolean false instead of 0 for ic_is_init_dev Currently the ip auto configuration has a hardcoded delay of 1 second. When (ethernet) link takes longer to come up (e.g. more than 3 seconds), nfs root may not be found. Remove the hardcoded delay, and wait for carrier on at least one network device. Signed-off-by: NMicha Nelissen <micha@neli.hopto.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Changli Gao 提交于
The characters in a line should be no more than 80. Signed-off-by: NChangli Gao <xiaosuo@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Changli Gao 提交于
As these functions are only used in this file. Signed-off-by: NChangli Gao <xiaosuo@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 5月, 2011 4 次提交
-
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This will next trickle down to rt_bind_peer(). Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This way the caller can get at the fully resolved fl4->{daddr,saddr} etc. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
It's way past it's usefulness. And this gets rid of a bunch of stray ->rt_{dst,src} references. Even the comment documenting the macro was inaccurate (stated default was 1 when it's 0). If reintroduced, it should be done properly, with dynamic debug facilities. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 5月, 2011 1 次提交
-
-
由 David S. Miller 提交于
Noticed by Joe Perches. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-