提交 · 86911732d3996a9da07914b280621450111bb6da · openanolis / cloud-kernel

30 1月, 2009 1 次提交

gro: Avoid copying headers of unmerged packets · 86911732

由 Herbert Xu 提交于 1月 29, 2009

Unfortunately simplicity isn't always the best.  The fraginfo
interface turned out to be suboptimal.  The problem was quite
obvious.  For every packet, we have to copy the headers from
the frags structure into skb->head, even though for 99% of the
packets this part is immediately thrown away after the merge.

LRO didn't have this problem because it directly read the headers
from the frags structure.

This patch attempts to address this by creating an interface
that allows GRO to access the headers in the first frag without
having to copy it.  Because all drivers that use frags place the
headers in the first frag this optimisation should be enough.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86911732

27 1月, 2009 1 次提交

gre: optimize hash lookup · afcf1242

由 Timo Teras 提交于 1月 26, 2009

Instead of keeping candidate tunnel device from all categories,
keep only one candidate with best score. This optimizes stack
usage and speeds up exit code.
Signed-off-by: NTimo Teras <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afcf1242

23 1月, 2009 9 次提交

netns: ipmr: enable namespace support in ipv4 multicast routing code · 4feb88e5

由 Benjamin Thery 提交于 1月 22, 2009

This last patch makes the appropriate changes to use and propagate the
network namespace where needed in IPv4 multicast routing code.

This consists mainly in replacing all the remaining init_net occurences
with current netns pointer retrieved from sockets, net devices or
mfc_caches depending on the routines' contexts.

Some routines receive a new 'struct net' parameter to propagate the current
netns:
* vif_add/vif_delete
* ipmr_new_tunnel
* mroute_clean_tables
* ipmr_cache_find
* ipmr_cache_report
* ipmr_cache_unresolved
* ipmr_mfc_add/ipmr_mfc_delete
* ipmr_get_route
* rt_fill_info (in route.c)
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4feb88e5

netns: ipmr: declare ipmr /proc/net entries per-namespace · f6bb4514

由 Benjamin Thery 提交于 1月 22, 2009

Declare IPv4 multicast forwarding /proc/net entries per-namespace:
/proc/net/ip_mr_vif
/proc/net/ip_mr_cache
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6bb4514

netns: ipmr: declare reg_vif_num per-namespace · 6c5143db

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv4 multicast routing netns-aware.

Declare variable 'reg_vif_num' per-namespace, move into struct netns_ipv4.

At the moment, this variable is only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c5143db

netns: ipmr: declare mroute_do_assert and mroute_do_pim per-namespace · 6f9374a9

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv4 multicast routing netns-aware.

Declare IPv multicast routing variables 'mroute_do_assert' and
'mroute_do_pim' per-namespace in struct netns_ipv4.

At the moment, these variables are only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f9374a9

netns: ipmr: declare counter cache_resolve_queue_len per-namespace · 1e8fb3b6

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv4 multicast routing netns-aware.

Declare variable cache_resolve_queue_len per-namespace: move it into
struct netns_ipv4.

This variable counts the number of unresolved cache entries queued in the
list mfc_unres_queue. This list is kept global to all netns as the number
of entries per namespace is limited to 10 (hardcoded in routine
ipmr_cache_unresolved).
Entries belonging to different namespaces in mfc_unres_queue will be
identified by matching the mfc_net member introduced previously in
struct mfc_cache.

Keeping this list global to all netns, also allows us to keep a single
timer (ipmr_expire_timer) to handle their expiration.
In some places cache_resolve_queue_len value was tested for arming
or deleting the timer. These tests were equivalent to testing
mfc_unres_queue value instead and are replaced in this patch.

At the moment, cache_resolve_queue_len is only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e8fb3b6

netns: ipmr: dynamically allocate mfc_cache_array · 2bb8b26c

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv4 multicast routing netns-aware.

Dynamically allocate IPv4 multicast forwarding cache, mfc_cache_array,
and move it to struct netns_ipv4.

At the moment, mfc_cache_array is only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2bb8b26c

netns: ipmr: store netns in struct mfc_cache · 5c0a66f5

由 Benjamin Thery 提交于 1月 22, 2009

This patch stores into struct mfc_cache the network namespace each
mfc_cache belongs to. The new member is mfc_net.

mfc_net is assigned at cache allocation and doesn't change during
the rest of the cache entry life.
A new net parameter is added to ipmr_cache_alloc/ipmr_cache_alloc_unres.

This will help to retrieve the current netns around the IPv4 multicast
routing code.

At the moment, all mfc_cache are allocated in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c0a66f5

netns: ipmr: dynamically allocate vif_table · cf958ae3

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv6 multicast routing netns-aware.

Dynamically allocate interface table vif_table and move it to
struct netns_ipv4, and update MIF_EXISTS() macro.

At the moment, vif_table is only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf958ae3

netns: ipmr: allocate mroute_socket per-namespace. · 70a269e6

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv4 multicast routing netns-aware.

Make IPv4 multicast routing mroute_socket per-namespace,
moves it into struct netns_ipv4.

At the moment, mroute_socket is only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70a269e6

22 1月, 2009 2 次提交

gre: strict physical device binding · 749c10f9

由 Timo Teras 提交于 1月 19, 2009

Check the device on receive path and allow otherwise identical devices
as long as the physical device differs.

This is useful for NBMA tunnels, where you want to use different gre IP
for each public IP available via different physical devices.
Signed-off-by: NTimo Teras <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

749c10f9

inet: Allowing more than 64k connections and heavily optimize bind(0) time. · a9d8f911

由 Evgeniy Polyakov 提交于 1月 19, 2009

With simple extension to the binding mechanism, which allows to bind more
than 64k sockets (or smaller amount, depending on sysctl parameters),
we have to traverse the whole bind hash table to find out empty bucket.
And while it is not a problem for example for 32k connections, bind()
completion time grows exponentially (since after each successful binding
we have to traverse one bucket more to find empty one) even if we start
each time from random offset inside the hash table.

So, when hash table is full, and we want to add another socket, we have
to traverse the whole table no matter what, so effectivelly this will be
the worst case performance and it will be constant.

Attached picture shows bind() time depending on number of already bound
sockets.

Green area corresponds to the usual binding to zero port process, which
turns on kernel port selection as described above. Red area is the bind
process, when number of reuse-bound sockets is not limited by 64k (or
sysctl parameters). The same exponential growth (hidden by the green
area) before number of ports reaches sysctl limit.

At this time bind hash table has exactly one reuse-enbaled socket in a
bucket, but it is possible that they have different addresses. Actually
kernel selects the first port to try randomly, so at the beginning bind
will take roughly constant time, but with time number of port to check
after random start will increase. And that will have exponential growth,
but because of above random selection, not every next port selection
will necessary take longer time than previous. So we have to consider
the area below in the graph (if you could zoom it, you could find, that
there are many different times placed there), so area can hide another.

Blue area corresponds to the port selection optimization.

This is rather simple design approach: hashtable now maintains (unprecise
and racely updated) number of currently bound sockets, and when number
of such sockets becomes greater than predefined value (I use maximum
port range defined by sysctls), we stop traversing the whole bind hash
table and just stop at first matching bucket after random start. Above
limit roughly corresponds to the case, when bind hash table is full and
we turned on mechanism of allowing to bind more reuse-enabled sockets,
so it does not change behaviour of other sockets.
Signed-off-by: NEvgeniy Polyakov <zbr@ioremap.net>
Tested-by: NDenys Fedoryschenko <denys@visp.net.lb>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a9d8f911

15 1月, 2009 1 次提交

gso: Ensure that the packet is long enough · 4e704ee3

由 Herbert Xu 提交于 1月 14, 2009

When we get a GSO packet from an untrusted source, we need to
ensure that it is sufficiently long so that we don't end up
crashing.

Based on discovery and patch by Ian Campbell.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Tested-by: NIan Campbell <ian.campbell@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e704ee3

14 1月, 2009 1 次提交

tcp: splice as many packets as possible at once · 33966dd0

由 Willy Tarreau 提交于 1月 13, 2009

As spotted by Willy Tarreau, current splice() from tcp socket to pipe is not
optimal. It processes at most one segment per call.
This results in low performance and very high overhead due to syscall rate
when splicing from interfaces which do not support LRO.

Willy provided a patch inside tcp_splice_read(), but a better fix
is to let tcp_read_sock() process as many segments as possible, so
that tcp_rcv_space_adjust() and tcp_cleanup_rbuf() are called less
often.

With this change, splice() behaves like tcp_recvmsg(), being able
to consume many skbs in one system call. With typical 1460 bytes
of payload per frame, that means splice(SPLICE_F_NONBLOCK) can return
16*1460 = 23360 bytes.
Signed-off-by: NWilly Tarreau <w@1wt.eu>
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33966dd0

13 1月, 2009 2 次提交

netfilter 06/09: nf_conntrack: fix ICMP/ICMPv6 timeout sysctls on big-endian · 71320afc

由 Patrick McHardy 提交于 1月 12, 2009

An old bug crept back into the ICMP/ICMPv6 conntrack protocols: the timeout
values are defined as unsigned longs, the sysctl's maxsize is set to
sizeof(unsigned int). Use unsigned int for the timeout values as in the
other conntrack protocols.
Reported-by: NJean-Mickael Guerin <jean-mickael.guerin@6wind.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71320afc

netfilter 01/09: remove "happy cracking" message · 88843104

由 Patrick McHardy 提交于 1月 12, 2009

Don't spam logs for locally generated short packets. these can only
be generated by root.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88843104

09 1月, 2009 1 次提交

tcp6: Add GRO support · 684f2176

由 Herbert Xu 提交于 1月 08, 2009

This patch adds GRO support for TCP over IPv6.  The code is exactly
the same as the IPv4 version except for the pseudo-header checksum
computation.

Note that I've removed the unused tcphdr argument from tcp_v6_check
rather than invent a bogus value for GRO.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

684f2176

07 1月, 2009 2 次提交

net_dma: convert to dma_find_channel · f67b4599

由 Dan Williams 提交于 1月 06, 2009

Use the general-purpose channel allocation provided by dmaengine.
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

f67b4599

dmaengine: up-level reference counting to the module level · 6f49a57a

由 Dan Williams 提交于 1月 06, 2009

Simply, if a client wants any dmaengine channel then prevent all dmaengine
modules from being removed.  Once the clients are done re-enable module
removal.

Why?, beyond reducing complication:
1/ Tracking reference counts per-transaction in an efficient manner, as
   is currently done, requires a complicated scheme to avoid cache-line
   bouncing effects.
2/ Per-transaction ref-counting gives the false impression that a
   dma-driver can be gracefully removed ahead of its user (net, md, or
   dma-slave)
3/ None of the in-tree dma-drivers talk to hot pluggable hardware, but
   if such an engine were built one day we still would not need to notify
   clients of remove events.  The driver can simply return NULL to a
   ->prep() request, something that is much easier for a client to handle.
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

6f49a57a

05 1月, 2009 3 次提交

tcp: Kill extraneous SPLICE_F_NONBLOCK checks. · 7945cc64

由 David S. Miller 提交于 1月 05, 2009

In splice TCP receive, the SPLICE_F_NONBLOCK flag is used
to compute the "timeo" value.  So checking it again inside
of the main receive loop to trigger -EAGAIN processing is
entirely unnecessary.

Noticed by Jarek P. and Lennert Buytenhek.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7945cc64

tcp: don't mask EOF and socket errors on nonblocking splice receive · 4f7d54f5

由 Lennert Buytenhek 提交于 1月 05, 2009

Currently, setting SPLICE_F_NONBLOCK on splice from a TCP socket
results in masking of EOF (RDHUP) and error conditions on the socket
by an -EAGAIN return.  Move the NONBLOCK check in tcp_splice_read()
to be after the EOF and error checks to fix this.
Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f7d54f5

gro: Use gso_size to store MSS · b530256d

由 Herbert Xu 提交于 1月 04, 2009

In order to allow GRO packets without frag_list at all, we need to
store the MSS in the packet itself.  The obvious place is gso_size.
The only thing to watch out for is if the packet ends up not being
GRO then we need to clear gso_size before pushing the packet into
the stack.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b530256d

01 1月, 2009 1 次提交

netlabel: Update kernel configuration API · 6c2e8ac0

由 Paul Moore 提交于 12月 31, 2008

Update the NetLabel kernel API to expose the new features added in kernel
releases 2.6.25 and 2.6.28: the static/fallback label functionality and network
address based selectors.
Signed-off-by: NPaul Moore <paul.moore@hp.com>

6c2e8ac0

30 12月, 2008 2 次提交

net: Fix percpu counters deadlock · eb4dea58

由 Herbert Xu 提交于 12月 29, 2008

When we converted the protocol atomic counters such as the orphan
count and the total socket count deadlocks were introduced due to
the mismatch in BH status of the spots that used the percpu counter
operations.

Based on the diagnosis and patch by Peter Zijlstra, this patch
fixes these issues by disabling BH where we may be in process
context.
Reported-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb4dea58

cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits: net · 0f23174a

由 Rusty Russell 提交于 12月 29, 2008

In future all cpumask ops will only be valid (in general) for bit
numbers < nr_cpu_ids.  So use that instead of NR_CPUS in iterators
and other comparisons.

This is always safe: no cpu number can be >= nr_cpu_ids, and
nr_cpu_ids is initialized to NR_CPUS at boot.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMike Travis <travis@sgi.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f23174a

26 12月, 2008 4 次提交

ipsec: Remove useless ret variable · f2712fd0

由 Herbert Xu 提交于 12月 26, 2008

This patch removes a useless ret variable from the IPv4 ESP/UDP
decapsulation code.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2712fd0

tcp: Always set urgent pointer if it's beyond snd_nxt · 64ff3b93

由 Herbert Xu 提交于 12月 25, 2008

Our TCP stack does not set the urgent flag if the urgent pointer
does not fit in 16 bits, i.e., if it is more than 64K from the
sequence number of a packet.

This behaviour is different from the BSDs, and clearly contradicts
the purpose of urgent mode, which is to send the notification
(though not necessarily the associated data) as soon as possible.
Our current behaviour may in fact delay the urgent notification
indefinitely if the receiver window does not open up.

Simply matching BSD however may break legacy applications which
incorrectly rely on the out-of-band delivery of urgent data, and
conversely the in-band delivery of non-urgent data.

Alexey Kuznetsov suggested a safe solution of following BSD only
if the urgent pointer itself has not yet been transmitted.  This
way we guarantee that when the remote end sees the packet with
non-urgent data marked as urgent due to wrap-around we would have
advanced the urgent pointer beyond, either to the actual urgent
data or to an as-yet untransmitted packet.

The only potential downside is that applications on the remote
end may see multiple SIGURG notifications.  However, this would
occur anyway with other TCP stacks.  More importantly, the outcome
of such a duplicate notification is likely to be harmless since
the signal itself does not carry any information other than the
fact that we're in urgent mode.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64ff3b93

netns: igmp: make /proc/net/{igmp,mcfilter} per netns · 7091e728

由 Alexey Dobriyan 提交于 12月 25, 2008

This patch makes the followinf proc entries per-netns:
/proc/net/igmp
/proc/net/mcfilter
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7091e728

netns: igmp: allow IPPROTO_IGMP sockets in netns · b4ee07df

由 Alexey Dobriyan 提交于 12月 25, 2008

Looks like everything is already ready.

Required for ebtables(8) for one thing.

Also, required for ipmr per-netns (coming soon). (Benjamin)
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4ee07df

19 12月, 2008 1 次提交

tcp: Stop scaring users with "treason uncloaked!" · 6086ebca

由 Matt Mackall 提交于 12月 18, 2008

The original message was unhelpful and extremely alarming to our poor
users, despite its charm. Make it less frightening.
Signed-off-by: NMatt Mackall <mpm@selenic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6086ebca

16 12月, 2008 3 次提交

ipmr: merge common code · b1879204

由 Ilpo Järvinen 提交于 12月 16, 2008

Also removes redundant skb->len < x check which can't
be true once pskb_may_pull(skb, x) succeeded.

$ diff-funcs pim_rcv ipmr.c ipmr.c pim_rcv_v1
  --- ipmr.c:pim_rcv()
  +++ ipmr.c:pim_rcv_v1()
@@ -1,22 +1,27 @@
-static int pim_rcv(struct sk_buff * skb)
+int pim_rcv_v1(struct sk_buff * skb)
 {
-	struct pimreghdr *pim;
+	struct igmphdr *pim;
 	struct iphdr   *encap;
 	struct net_device  *reg_dev = NULL;

 	if (!pskb_may_pull(skb, sizeof(*pim) + sizeof(*encap)))
 		goto drop;

-	pim = (struct pimreghdr *)skb_transport_header(skb);
-	if (pim->type != ((PIM_VERSION<<4)|(PIM_REGISTER)) ||
-	    (pim->flags&PIM_NULL_REGISTER) ||
-	    (ip_compute_csum((void *)pim, sizeof(*pim)) != 0 &&
-	     csum_fold(skb_checksum(skb, 0, skb->len, 0))))
+	pim = igmp_hdr(skb);
+
+	if (!mroute_do_pim ||
+	    skb->len < sizeof(*pim) + sizeof(*encap) ||
+	    pim->group != PIM_V1_VERSION || pim->code != PIM_V1_REGISTER)
 		goto drop;

-	/* check if the inner packet is destined to mcast group */
 	encap = (struct iphdr *)(skb_transport_header(skb) +
-				 sizeof(struct pimreghdr));
+				 sizeof(struct igmphdr));
+	/*
+	   Check that:
+	   a. packet is really destinted to a multicast group
+	   b. packet is not a NULL-REGISTER
+	   c. packet is not truncated
+	 */
 	if (!ipv4_is_multicast(encap->daddr) ||
 	    encap->tot_len == 0 ||
 	    ntohs(encap->tot_len) + sizeof(*pim) > skb->len)
@@ -40,9 +45,9 @@
 	skb->ip_summed = 0;
 	skb->pkt_type = PACKET_HOST;
 	dst_release(skb->dst);
+	skb->dst = NULL;
 	reg_dev->stats.rx_bytes += skb->len;
 	reg_dev->stats.rx_packets++;
-	skb->dst = NULL;
 	nf_reset(skb);
 	netif_rx(skb);
 	dev_put(reg_dev);

$ codiff net/ipv4/ipmr.o.old net/ipv4/ipmr.o.new

net/ipv4/ipmr.c:
  pim_rcv_v1 | -283
  pim_rcv    | -284
 2 functions changed, 567 bytes removed

net/ipv4/ipmr.c:
  __pim_rcv | +307
 1 function changed, 307 bytes added

net/ipv4/ipmr.o.new:
 3 functions changed, 307 bytes added, 567 bytes removed, diff: -260

(Tested on x86_64).

It seems that pimlen arg could be left out as well and
eq-sizedness of structs trapped with BUILD_BUG_ON but
I don't think that's more than a cosmetic flaw since there
aren't that many args anyway.

Compile tested.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1879204

tcp: Add GRO support · bf296b12

由 Herbert Xu 提交于 12月 15, 2008

This patch adds the TCP-specific portion of GRO.  The criterion for
merging is extremely strict (the TCP header must match exactly apart
from the checksum) so as to allow refragmentation.  Otherwise this
is pretty much identical to LRO, except that we support the merging
of ECN packets.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf296b12

ipv4: Add GRO infrastructure · 73cc19f1

由 Herbert Xu 提交于 12月 15, 2008

This patch adds GRO support for IPv4.

The criteria for merging is more stringent than LRO, in particular,
we require all fields in the IP header to be identical except for
the length, ID and checksum.  In addition, the ID must form an
arithmetic sequence with a difference of one.

The ID requirement might seem overly strict, however, most hardware
TSO solutions already obey this rule.  Linux itself also obeys this
whether GSO is in use or not.

In future we could relax this rule by storing the IDs (or rather
making sure that we don't drop them when pulling the aggregate
skb's tail).
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73cc19f1

15 12月, 2008 2 次提交

netfilter: update rwlock initialization for nat_table · be70ed18

由 Steven Rostedt 提交于 12月 15, 2008

    
The commit e099a173
(netfilter: netns nat: per-netns NAT table) renamed the
nat_table from __nat_table to nat_table without updating the
__RW_LOCK_UNLOCKED(__nat_table.lock).
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be70ed18

icsk: join error paths using goto · 857a6e0a

由 Ilpo Järvinen 提交于 12月 14, 2008

Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

857a6e0a

09 12月, 2008 1 次提交

tcp: tcp_vegas cong avoid fix · 8d3a564d

由 Doug Leith 提交于 12月 09, 2008

This patch addresses a book-keeping issue in tcp_vegas.c. At present
tcp_vegas does separate book-keeping of cwnd based on packet sequence
numbers. A mismatch can develop between this book-keeping and
tp->snd_cwnd due, for example, to delayed acks acking multiple
packets. When vegas transitions to reno operation (e.g. following
loss), then this mismatch leads to incorrect behaviour (akin to a cwnd
backoff). This seems mostly to affect operation at low cwnds where
delayed acking can lead to a significant fraction of cwnd being
covered by a single ack, leading to the book-keeping mismatch. This
patch modifies the congestion avoidance update to avoid the need for
separate book-keeping while leaving vegas congestion avoidance
functionally unchanged. A secondary advantage of this modification is
that the use of fixed-point (via V_PARAM_SHIFT) and 64 bit arithmetic
is no longer necessary, simplifying the code.

Some example test measurements with the patched code (confirming no functional
change in the congestion avoidance algorithm) can be seen at:

http://www.hamilton.ie/doug/vegaspatch/Signed-off-by: NDoug Leith <doug.leith@nuim.ie>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d3a564d

06 12月, 2008 3 次提交

tcp: fix tso_should_defer in 64bit · a2acde07

由 Ilpo Järvinen 提交于 12月 05, 2008

Since jiffies is unsigned long, the types get expanded into
that and after long enough time the difference will therefore
always be > 1 (and that probably happens near boot as well as
iirc the first jiffies wrap is scheduler close after boot to
find out problems related to that early).

This was originally noted by Bill Fink in Dec'07 but nobody
never ended fixing it.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2acde07

tcp: use tcp_write_xmit also in tcp_push_one · d5dd9175

由 Ilpo Järvinen 提交于 12月 05, 2008

tcp_minshall_update is not significant difference since it only
checks for not full-sized skb which is BUG'ed on the push_one
path anyway.

tcp_snd_test is tcp_nagle_test+tcp_cwnd_test+tcp_snd_wnd_test,
just the order changed slightly.

net/ipv4/tcp_output.c:
  tcp_snd_test              |  -89
  tcp_mss_split_point       |  -91
  tcp_may_send_now          |  +53
  tcp_cwnd_validate         |  -98
  tso_fragment              | -239
  __tcp_push_pending_frames | -1340
  tcp_push_one              | -146
 7 functions changed, 53 bytes added, 2003 bytes removed, diff: -1950

net/ipv4/tcp_output.c:
  tcp_write_xmit | +1772
 1 function changed, 1772 bytes added, diff: +1772

tcp_output.o.new:
 8 functions changed, 1825 bytes added, 2003 bytes removed, diff: -178
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5dd9175

tcp: move some parts from tcp_write_xmit · 726e07a8

由 Ilpo Järvinen 提交于 12月 05, 2008

Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

726e07a8

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功