提交 · 91cc17c0e5e5ada156a8d5787a2509d263ea6bbf · openanolis / cloud-kernel

23 11月, 2007 1 次提交

[TCP]: MTUprobe: receiver window & data available checks fixed · 91cc17c0

由 Ilpo Järvinen 提交于 11月 23, 2007

It seems that the checked range for receiver window check should
begin from the first rather than from the last skb that is going
to be included to the probe. And that can be achieved without
reference to skbs at all, snd_nxt and write_seq provides the
correct seqno already. Plus, it SHOULD account packets that are
necessary to trigger fast retransmit [RFC4821].

Location of snd_wnd < probe_size/size_needed check is bogus
because it will cause the other if() match as well (due to
snd_nxt >= snd_una invariant).

Removed dead obvious comment.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

91cc17c0

21 11月, 2007 4 次提交

[IPVS]: Fix compiler warning about unused register_ip_vs_protocol · d535a916

由 Pavel Emelyanov 提交于 11月 20, 2007

This is silly, but I have turned the CONFIG_IP_VS to m,
to check the compilation of one (recently sent) fix
and set all the CONFIG_IP_VS_PROTO_XXX options to n to
speed up the compilation.

In this configuration the compiler warns me about

  CC [M]  net/ipv4/ipvs/ip_vs_proto.o
net/ipv4/ipvs/ip_vs_proto.c:49: warning: 'register_ip_vs_protocol' defined but not used

Indeed. With no protocols selected there are no
calls to this function - all are compiled out with
ifdefs.

Maybe the best fix would be to surround this call with
ifdef-s or tune the Kconfig dependences, but I think that
marking this register function as __used is enough. No?
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d535a916

[ARP]: Fix arp reply when sender ip 0 · b4a9811c

由 Jonas Danielsson 提交于 11月 20, 2007

Fix arp reply when received arp probe with sender ip 0.

Send arp reply with target ip address 0.0.0.0 and target hardware
address set to hardware address of requester. Previously sent reply
with target ip address and target hardware address set to same as
source fields.
Signed-off-by: NJonas Danielsson <the.sator@gmail.com>
Acked-by: NAlexey Kuznetov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4a9811c

Y
[IPV4] TCPMD5: Use memmove() instead of memcpy() because we have overlaps. · 354faf09
由 YOSHIFUJI Hideaki 提交于 11月 20, 2007
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
354faf09
Y
[IPV4] TCPMD5: Omit redundant NULL check for kfree() argument. · a80cc20d
由 YOSHIFUJI Hideaki 提交于 11月 20, 2007
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
a80cc20d

20 11月, 2007 7 次提交

[NETFILTER]: Fix kernel panic with REDIRECT target. · 1f305323

由 Evgeniy Polyakov 提交于 11月 20, 2007

When connection tracking entry (nf_conn) is about to copy itself it can
have some of its extension users (like nat) as being already freed and
thus not required to be copied.

Actually looking at this function I suspect it was copied from
nf_nat_setup_info() and thus bug was introduced.

Report and testing from David <david@unsolicited.net>.

[ Patrick McHardy states:

	I now understand whats happening:

	- new connection is allocated without helper
	- connection is REDIRECTed to localhost
	- nf_nat_setup_info adds NAT extension, but doesn't initialize it yet
	- nf_conntrack_alter_reply performs a helper lookup based on the
	   new tuple, finds the SIP helper and allocates a helper extension,
	   causing reallocation because of too little space
	- nf_nat_move_storage is called with the uninitialized nat extension

	So your fix is entirely correct, thanks a lot :)  ]
Signed-off-by: NEvgeniy Polyakov <johnpol@2ka.mipt.ru>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f305323

[IPV4]: Add missing "space" · 464c4f18

由 Joe Perches 提交于 11月 19, 2007

Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

464c4f18

[TCP]: Problem bug with sysctl_tcp_congestion_control function · 5487796f

由 Sam Jansen 提交于 11月 19, 2007

From: "Sam Jansen" <sjansen@google.com>

sysctl_tcp_congestion_control seems to have a bug that prevents it
from actually calling the tcp_set_default_congestion_control
function. This is not so apparent because it does not return an error
and generally the /proc interface is used to configure the default TCP
congestion control algorithm. This is present in 2.6.18 onwards and
probably earlier, though I have not inspected 2.6.15--2.6.17.

sysctl_tcp_congestion_control calls sysctl_string and expects a successful
return code of 0. In such a case it actually sets the congestion control
algorithm with tcp_set_default_congestion_control. Otherwise, it returns the
value returned by sysctl_string. This was correct in 2.6.14, as sysctl_string
returned 0 on success. However, sysctl_string was updated to return 1 on
success around about 2.6.15 and sysctl_tcp_congestion_control was not updated.
Even though sysctl_tcp_congestion_control returns 1, do_sysctl_strategy
converts this return code to '0', so the caller never notices the error.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5487796f

[TCP] MTUprobe: fix potential sk_send_head corruption · 6e421410

由 Ilpo Jrvinen 提交于 11月 19, 2007

When the abstraction functions got added, conversion here was
made incorrectly. As a result, the skb may end up pointing
to skb which got included to the probe skb and then was freed.
For it to trigger, however, skb_transmit must fail sending as
well.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e421410

[IPVS]: Move remaining sysctl handlers over to CTL_UNNUMBERED · 9055fa1f

由 Simon Horman 提交于 11月 19, 2007

Switch the remaining IPVS sysctl entries over to to use CTL_UNNUMBERED,
I stronly doubt that anyone is using the sys_sysctl interface to
these variables.
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9055fa1f

[IPVS]: Fix sysctl warnings about missing strategy in schedulers · 9e103fa6

由 Simon Horman 提交于 11月 19, 2007

sysctl table check failed: /net/ipv4/vs/lblc_expiration .3.5.21.19 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/lblcr_expiration .3.5.21.20 Missing strategy

Switch these entried over to use CTL_UNNUMBERED as clearly
the sys_syscal portion wasn't working.

This is along the same lines as Christian Borntraeger's patch that fixes
up entries with no stratergy in net/ipv4/ipvs/ip_vs_ctl.c
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e103fa6

[IPVS]: Fix sysctl warnings about missing strategy · 611cd55b

由 Christian Borntraeger 提交于 11月 19, 2007

Running the latest git code I get the following messages during boot:
sysctl table check failed: /net/ipv4/vs/drop_entry .3.5.21.4 Missing strategy
[...]		  
sysctl table check failed: /net/ipv4/vs/drop_packet .3.5.21.5 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/secure_tcp .3.5.21.6 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/sync_threshold .3.5.21.24 Missing strategy

I removed the binary sysctl handler for those messages and also removed
the definitions in ip_vs.h. The alternative would be to implement a 
proper strategy handler, but syscall sysctl is deprecated.

There are other sysctl definitions that are commented out or work with 
the default sysctl_data strategy. I did not touch these. 
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

611cd55b

19 11月, 2007 1 次提交

[NET]: Corrects a bug in ip_rt_acct_read() · 483b23ff

由 Eric Dumazet 提交于 11月 16, 2007

It seems that stats of cpu 0 are counted twice, since
for_each_possible_cpu() is looping on all possible cpus, including 0

Before percpu conversion of ip_rt_acct, we should also remove the
assumption that CPU 0 is online (or even possible)
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

483b23ff

15 11月, 2007 3 次提交

[NET]: rt_check_expire() can take a long time, add a cond_resched() · d90bf5a9

由 Eric Dumazet 提交于 11月 14, 2007

On commit 39c90ece:

	[IPV4]: Convert rt_check_expire() from softirq processing to workqueue.

we converted rt_check_expire() from softirq to workqueue, allowing the
function to perform all work it was supposed to do.

When the IP route cache is big, rt_check_expire() can take a long time
to run.  (default settings : 20% of the hash table is scanned at each
invocation)

Adding cond_resched() helps giving cpu to higher priority tasks if
necessary.

Using a "if (need_resched())" test before calling "cond_resched();" is
necessary to avoid spending too much time doing the resched check.
(My tests gave a time reduction from 88 ms to 25 ms per
rt_check_expire() run on my i686 test machine)
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d90bf5a9

[TCP] FRTO: Clear frto_highmark only after process_frto that uses it · e1cd8f78

由 Ilpo Jrvinen 提交于 11月 14, 2007

I broke this in commit 3de96471:

	[TCP]: Wrap-safed reordering detection FRTO check

tcp_process_frto should always see a valid frto_highmark. An invalid
frto_highmark (zero) is very likely what ultimately caused a seqno
compare in tcp_frto_enter_loss to do the wrong leading to the LOST-bit
leak.

Having LOST-bits integry ensured like done after commit
23aeeec3:

	[TCP] FRTO: Plug potential LOST-bit leak

won't hurt. It may still be useful in some other, possibly legimate,
scenario.

Reported by Chazarain Guillaume <guichaz@yahoo.fr>.
Signed-off-by: NIlpo Jrvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1cd8f78

[TCP]: Make sure write_queue_from does not begin with NULL ptr · 96a2d41a

由 Ilpo Jrvinen 提交于 11月 14, 2007

NULL ptr can be returned from tcp_write_queue_head to cached_skb
and then assigned to skb if packets_out was zero. Without this,
system is vulnerable to a carefully crafted ACKs which obviously
is remotely triggerable.

Besides, there's very little that needs to be done in sacktag
if there weren't any packets outstanding, just skipping the rest
doesn't hurt.
Signed-off-by: NIlpo Jrvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96a2d41a

14 11月, 2007 2 次提交

[TCP] FRTO: Plug potential LOST-bit leak · 23aeeec3

由 Ilpo Jrvinen 提交于 11月 13, 2007

It might be possible that, in some extreme scenario that
I just cannot now construct in my mind, end_seq <=
frto_highmark check does not match causing the lost_out
and LOST bits become out-of-sync due to clearing and
recounting in the loop.

This may fix LOST-bit leak reported by Chazarain Guillaume
<guichaz@yahoo.fr>.
Signed-off-by: NIlpo Jrvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23aeeec3

[TCP] FRTO: Limit snd_cwnd if TCP was application limited · 746aa32d

由 Ilpo Jrvinen 提交于 11月 13, 2007

Otherwise TCP might violate packet ordering principles that FRTO
is based on. If conventional recovery path is chosen, this won't
be significant at all. In practice, any small enough value will
be sufficient to provide proper operation for FRTO, yet other
users of snd_cwnd might benefit from a "close enough" value.

FRTO's formula is now equal to what tcp_enter_cwr() uses.

FRTO used to check application limitedness a bit differently but
I changed that in commit 575ee714
and as a result checking for application limitedness became
completely non-existing.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

746aa32d

13 11月, 2007 3 次提交

[NETFILTER]: nf_nat: fix memset error · e0bf9cf1

由 Li Zefan 提交于 11月 13, 2007

The size passing to memset is the size of a pointer.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0bf9cf1

[INET]: Use list_head-s in inetpeer.c · d71209de

由 Pavel Emelyanov 提交于 11月 12, 2007

The inetpeer.c tracks the LRU list of inet_perr-s, but makes
it by hands. Use the list_head-s for this.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d71209de

[IPVS]: Remove unused exports. · 22649d1a

由 Adrian Bunk 提交于 11月 12, 2007

This patch removes the following unused EXPORT_SYMBOL's:
- ip_vs_try_bind_dest
- ip_vs_find_dest
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22649d1a

11 11月, 2007 9 次提交

[INET]: Small possible memory leak in FIB rules · 2994c638

由 Denis V. Lunev 提交于 11月 10, 2007

This patch fixes a small memory leak. Default fib rules can be deleted by
the user if the rule does not carry FIB_RULE_PERMANENT flag, f.e. by
	ip rule flush

Such a rule will not be freed as the ref-counter has 2 on start and becomes
clearly unreachable after removal.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2994c638

[INET]: Cleanup the xfrm4_tunnel_(un)register · 358352b8

由 Pavel Emelyanov 提交于 11月 10, 2007

Both check for the family to select an appropriate tunnel list.
Consolidate this check and make the for() loop more readable.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

358352b8

[INET]: Add missed tunnel64_err handler · 99f93326

由 Pavel Emelyanov 提交于 11月 10, 2007

The tunnel64_protocol uses the tunnel4_protocol's err_handler and
thus calls the tunnel4_protocol's handlers.

This is not very good, as in case of (icmp) error the wrong error
handlers will be called (e.g. ipip ones instead of sit) and this
won't be noticed at all, because the error is not reported.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99f93326

[NET]: Make helper to get dst entry and "use" it · 03f49f34

由 Pavel Emelyanov 提交于 11月 10, 2007

There are many places that get the dst entry, increase the
__use counter and set the "lastuse" time stamp.

Make a helper for this.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03f49f34

[IPV4]: Remove bugus goto-s from ip_route_input_slow · b1667609

由 Pavel Emelyanov 提交于 11月 10, 2007

Both places look like

        if (err == XXX) 
               goto yyy;
   done:

while both yyy targets look like

        err = XXX;
        goto done;

so this is ok to remove the above if-s.

yyy labels are used in other places and are not removed.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1667609

[TCP]: Split SACK FRTO flag clearing (fixes FRTO corner case bug) · fbd52eb2

由 Ilpo Jrvinen 提交于 11月 10, 2007

In case we run out of mem when fragmenting, the clearing of
FLAG_ONLY_ORIG_SACKED might get missed which then feeds FRTO
with false information. Move clearing outside skb processing
loop so that it will get executed even if the skb loop
terminates prematurely due to out-of-mem.

Besides, now the core of the loop truly deals with a single
skb only, which also enables creation a more self-contained
of tcp_sacktag_one later on.

In addition, small reorganization of if branches was made.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fbd52eb2

I
[TCP]: Add unlikely() to sacktag out-of-mem in fragment case · e49aa5d4
由 Ilpo Jrvinen 提交于 11月 10, 2007
```
Signed-off-by: NIlpo Jrvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
e49aa5d4

[TCP]: Fix reord detection due to snd_una covered holes · c7caf8d3

由 Ilpo Jrvinen 提交于 11月 10, 2007

Fixes subtle bug like the one with fastpath_cnt_hint happening
due to the way the GSO and hints interact. Because hints are not
reset when just a GSOed skb is partially ACKed, there's no
guarantee that the relevant part of the write queue is going to
be processed in sacktag at all (skbs below snd_una) because
fastpath hint can fast forward the entrypoint.

This was also on the way of future reductions in sacktag's skb
processing. Also future cleanups in sacktag can be made after
this (in 2.6.25).

This may make reordering update in tcp_try_undo_partial
redundant but I'm not too sure so I left it there.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7caf8d3

[TCP]: Consider GSO while counting reord in sacktag · 8dd71c5d

由 Ilpo Jrvinen 提交于 11月 10, 2007

Reordering detection fails to take account that the reordered
skb may have pcount larger than 1. In such case the lowest of
them had the largest reordering, the old formula used the
highest of them which is pcount - 1 packets less reordered.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8dd71c5d

07 11月, 2007 10 次提交

[INET]: Remove per bucket rwlock in tcp/dccp ehash table. · 230140cf

由 Eric Dumazet 提交于 11月 07, 2007

As done two years ago on IP route cache table (commit
22c047cc) , we can avoid using one
lock per hash bucket for the huge TCP/DCCP hash tables.

On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for
litle performance differences. (we hit a different cache line for the
rwlock, but then the bucket cache line have a better sharing factor
among cpus, since we dirty it less often). For netstat or ss commands
that want a full scan of hash table, we perform fewer memory accesses.

Using a 'small' table of hashed rwlocks should be more than enough to
provide correct SMP concurrency between different buckets, without
using too much memory. Sizing of this table depends on
num_possible_cpus() and various CONFIG settings.

This patch provides some locking abstraction that may ease a future
work using a different model for TCP/DCCP table.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

230140cf

[IPVS]: Synchronize closing of Connections · efac5276

由 Rumen G. Bogdanovski 提交于 11月 07, 2007

This patch makes the master daemon to sync the connection when it is about
to close.  This makes the connections on the backup to close or timeout
according their state.  Before the sync was performed only if the
connection is in ESTABLISHED state which always made the connections to
timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch
([IPVS]: use proper timeout instead of fixed value) effectively did nothing
more than increasing this to 15 minutes (Established state timeout).  So
this patch makes use of proper timeout since it syncs the connections on
status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout).
However if the backup misses CLOSE hopefully it did not miss FIN_WAIT.
Otherwise we will just have to wait for the ESTABLISHED state timeout. As
it is without this patch.  This way the number of the hanging connections
on the backup is kept to minimum. And very few of them will be left to
timeout with a long timeout.

This is important if we want to make use of the fix for the real server
overcommit on master/backup fail-over.
Signed-off-by: NRumen G. Bogdanovski <rumen@voicecho.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

efac5276

[IPVS]: Bind connections on stanby if the destination exists · 1e356f9c

由 Rumen G. Bogdanovski 提交于 11月 07, 2007

This patch fixes the problem with node overload on director fail-over.
Given the scenario: 2 nodes each accepting 3 connections at a time and 2
directors, director failover occurs when the nodes are fully loaded (6
connections to the cluster) in this case the new director will assign
another 6 connections to the cluster, If the same real servers exist
there.

The problem turned to be in not binding the inherited connections to
the real servers (destinations) on the backup director. Therefore:
"ipvsadm -l" reports 0 connections:
root@test2:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  -> node473.local:5999           Route   1000   0          0
  -> node484.local:5999           Route   1000   0          0

while "ipvs -lnc" is right
root@test2:~# ipvsadm -lnc
IPVS connection entries
pro expire state       source             virtual            destination
TCP 14:56  ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999
192.168.0.51:5999
TCP 14:59  ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999
192.168.0.52:5999

So the patch I am sending fixes the problem by binding the received
connections to the appropriate service on the backup director, if it
exists, else the connection will be handled the old way. So if the
master and the backup directors are synchronized in terms of real
services there will be no problem with server over-committing since
new connections will not be created on the nonexistent real services
on the backup. However if the service is created later on the backup,
the binding will be performed when the next connection update is
received. With this patch the inherited connections will show as
inactive on the backup:

root@test2:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  -> node473.local:5999           Route   1000   0          1
  -> node484.local:5999           Route   1000   0          1

rumen@test2:~$ cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP  C0A800DE:176F wlc
  -> C0A80033:176F      Route   1000   0          1
  -> C0A80032:176F      Route   1000   0          1

Regards,
Rumen Bogdanovski
Acked-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NRumen G. Bogdanovski <rumen@voicecho.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

1e356f9c

[IPSEC]: Fix crypto_alloc_comp error checking · 4999f362

由 Herbert Xu 提交于 11月 07, 2007

The function crypto_alloc_comp returns an errno instead of NULL
to indicate error.  So it needs to be tested with IS_ERR.

This is based on a patch by Vicen Beltran Querol.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4999f362

[IPV4]: Compact some ifdefs in the fib code. · c3e9a353

由 Pavel Emelyanov 提交于 11月 06, 2007

There are places that check for CONFIG_IP_MULTIPLE_TABLES
twice in the same file, but the internals of these #ifdefs
can be merged.

As a side effect - remove one ifdef from inside a function.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3e9a353

[IPV4]: Use the {DEFINE|REF}_PROTO_INUSE infrastructure · 47a31a6f

由 Eric Dumazet 提交于 11月 05, 2007

Trivial patch to make "tcp,udp,udplite,raw" protocols uses the fast
"inuse sockets" infrastructure

Each protocol use then a static percpu var, instead of a dynamic one.
This saves some ram and some cpu cycles
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47a31a6f

[NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way. · 286ab3d4

由 Eric Dumazet 提交于 11月 05, 2007

"struct proto" currently uses an array stats[NR_CPUS] to track change on
'inuse' sockets per protocol.

If NR_CPUS is big, this means we use a big memory area for this.
Moreover, all this memory area is located on a single node on NUMA
machines, increasing memory pressure on the boot node.

In this patch, I tried to :

- Keep a fast !CONFIG_SMP implementation
- Keep a fast CONFIG_SMP implementation for often used protocols
(tcp,udp,raw,...)
- Introduce a NUMA efficient implementation

Some helper macros are defined in include/net/sock.h
These macros take into account CONFIG_SMP

If a "struct proto" is declared without using DEFINE_PROTO_INUSE /
REF_PROTO_INUSE
macros, it will automatically use a default implementation, using a
dynamically allocated percpu zone.
This default implementation will be NUMA efficient, but might use 32/64
bytes per possible cpu
because of current alloc_percpu() implementation.
However it still should be better than previous implementation based on
stats[NR_CPUS] field.

When a "struct proto" is changed to use the new macros, we use a single
static "int" percpu variable,
lowering the memory and cpu costs, still preserving NUMA efficiency.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

286ab3d4

[IPV4]: Clean the ip_sockglue.c from some ugly ifdefs · 6a9fb947

由 Pavel Emelyanov 提交于 11月 05, 2007

The #idfed CONFIG_IP_MROUTE is sometimes places inside the if-s,
which looks completely bad. Similar ifdefs inside the functions
looks a bit better, but they are also not recommended to be used.

Provide an ifdef-ed ip_mroute_opt() helper to cleanup the code.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a9fb947

[IPV4]: Consolidate the ip cork destruction in ip_output.c · 429f08e9

由 Pavel Emelyanov 提交于 11月 05, 2007

The ip_push_pending_frames and ip_flush_pending_frames do the
same things to flush the sock's cork. Move this into a separate
function and save ~80 bytes from the .text
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

429f08e9

[NETFILTER]: remove unneeded rcu_dereference() calls · d1332e0a

由 Patrick McHardy 提交于 11月 05, 2007

As noticed by Paul McKenney, the rcu_dereference calls in the init path
of NAT modules are unneeded, remove them.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1332e0a

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功