提交 · 02f1c89d6e36507476f78108a3dcc78538be460b · openanolis / cloud-kernel

09 1月, 2008 2 次提交

[NET]: Clone the sk_buff 'iif' field in __skb_clone() · 02f1c89d

由 Paul Moore 提交于 1月 07, 2008

Both NetLabel and SELinux (other LSMs may grow to use it as well) rely
on the 'iif' field to determine the receiving network interface of
inbound packets.  Unfortunately, at present this field is not
preserved across a skb clone operation which can lead to garbage
values if the cloned skb is sent back through the network stack.  This
patch corrects this problem by properly copying the 'iif' field in
__skb_clone() and removing the 'iif' field assignment from
skb_act_clone() since it is no longer needed.

Also, while we are here, put the assignments in the same order as the
offsets to reduce cacheline bounces.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02f1c89d

[SCTP]: Fix the name of the authentication event. · f691724c

由 Vlad Yasevich 提交于 1月 07, 2008

The even should be called SCTP_AUTHENTICATION_INDICATION.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f691724c

27 12月, 2007 2 次提交

[VETH]: move veth.h to include/linux · ecef969e

由 Stephen Hemminger 提交于 12月 25, 2007

Move veth.h from net/ to linux/ since it is a user api, and add it to
user header processing Kbuild.

[ Use header-y as suggested by Sam Ravnborg.  -DaveM ]
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ecef969e

[NETFILTER]: nf_conntrack_ipv4: fix module parameter compatibility · fae718dd

由 Patrick McHardy 提交于 12月 24, 2007

Some users do "modprobe ip_conntrack hashsize=...". Since we have the
module aliases this loads nf_conntrack_ipv4 and nf_conntrack, the
hashsize parameter is unknown for nf_conntrack_ipv4 however and makes
it fail.

Allow to specify hashsize= for both nf_conntrack and nf_conntrack_ipv4.

Note: the nf_conntrack message in the ringbuffer will display an
incorrect hashsize since nf_conntrack is first pulled in as a
dependency and calculates the size itself, then it gets changed
through a call to nf_conntrack_set_hashsize().
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fae718dd

21 12月, 2007 1 次提交

[NET] include/net/: Spelling fixes · f4ab2f72

由 Joe Perches 提交于 12月 20, 2007

Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4ab2f72

07 12月, 2007 2 次提交

[SCTP]: Fix the bind_addr info during migration. · 8e71a11c

由 Vlad Yasevich 提交于 12月 06, 2007

During accept/migrate the code attempts to copy the addresses from
the parent endpoint to the new endpoint.   However, if the parent
was bound to a wildcard address, then we end up pointlessly copying
all of the current addresses on the system.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e71a11c

[IPV4]: Remove prototype of ip_rt_advice · 56c99d04

由 Denis V. Lunev 提交于 12月 06, 2007

ip_rt_advice has been gone, so no need to keep prototype and debug message.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56c99d04

29 11月, 2007 1 次提交

SCTP: Fix build issues with SCTP AUTH. · b7e0fe9f

由 Vlad Yasevich 提交于 11月 29, 2007

SCTP-AUTH requires selection of CRYPTO, HMAC and SHA1 since
SHA1 is a MUST requirement for AUTH.  We also support SHA256,
but that's optional, so fix the code to treat it as such.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

b7e0fe9f

26 11月, 2007 1 次提交

[IPV4]: Fix memory leak in inet_hashtables.h when NUMA is on · 218ad12f

由 Pavel Emelyanov 提交于 11月 26, 2007

The inet_ehash_locks_alloc() looks like this:

#ifdef CONFIG_NUMA
	if (size > PAGE_SIZE)
		x = vmalloc(...);
	else
#endif
		x = kmalloc(...);

Unlike it, the inet_ehash_locks_alloc() looks like this:

#ifdef CONFIG_NUMA
	if (size > PAGE_SIZE)
		vfree(x);
	else
#else
		kfree(x);
#endif

The error is obvious - if the NUMA is on and the size
is less than the PAGE_SIZE we leak the pointer (kfree is
inside the #else branch).

Compiler doesn't warn us because after the kfree(x) there's
a "x = NULL" assignment, so here's another (minor?) bug: we 
don't set x to NULL under certain circumstances.

Boring explanation, I know... Patch explains it better.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

218ad12f

21 11月, 2007 1 次提交

ieee80211: Stop net_ratelimit/IEEE80211_DEBUG_DROP log pollution · 92468c53

由 Guillaume Chazarain 提交于 11月 19, 2007

if (net_ratelimit())
	IEEE80211_DEBUG_DROP(...)

can pollute the logs with messages like:

printk: 1 messages suppressed.
printk: 2 messages suppressed.
printk: 7 messages suppressed.

if debugging information is disabled. These messages are printed by
net_ratelimit(). Add a wrapper to net_ratelimit() that takes into account
the log level, so that net_ratelimit() is called only when we really want
to print something.
Signed-off-by: NGuillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

92468c53

20 11月, 2007 4 次提交

[TCP] MTUprobe: fix potential sk_send_head corruption · 6e421410

由 Ilpo Jrvinen 提交于 11月 19, 2007

When the abstraction functions got added, conversion here was
made incorrectly. As a result, the skb may end up pointing
to skb which got included to the probe skb and then was freed.
For it to trigger, however, skb_transmit must fail sending as
well.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e421410

[IPVS]: Move remaining sysctl handlers over to CTL_UNNUMBERED · 9055fa1f

由 Simon Horman 提交于 11月 19, 2007

Switch the remaining IPVS sysctl entries over to to use CTL_UNNUMBERED,
I stronly doubt that anyone is using the sys_sysctl interface to
these variables.
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9055fa1f

[IPVS]: Fix sysctl warnings about missing strategy in schedulers · 9e103fa6

由 Simon Horman 提交于 11月 19, 2007

sysctl table check failed: /net/ipv4/vs/lblc_expiration .3.5.21.19 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/lblcr_expiration .3.5.21.20 Missing strategy

Switch these entried over to use CTL_UNNUMBERED as clearly
the sys_syscal portion wasn't working.

This is along the same lines as Christian Borntraeger's patch that fixes
up entries with no stratergy in net/ipv4/ipvs/ip_vs_ctl.c
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e103fa6

[IPVS]: Fix sysctl warnings about missing strategy · 611cd55b

由 Christian Borntraeger 提交于 11月 19, 2007

Running the latest git code I get the following messages during boot:
sysctl table check failed: /net/ipv4/vs/drop_entry .3.5.21.4 Missing strategy
[...]		  
sysctl table check failed: /net/ipv4/vs/drop_packet .3.5.21.5 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/secure_tcp .3.5.21.6 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/sync_threshold .3.5.21.24 Missing strategy

I removed the binary sysctl handler for those messages and also removed
the definitions in ip_vs.h. The alternative would be to implement a 
proper strategy handler, but syscall sysctl is deprecated.

There are other sysctl definitions that are commented out or work with 
the default sysctl_data strategy. I did not touch these. 
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

611cd55b

19 11月, 2007 1 次提交

[TCP]: Fix TCP header misalignment · 21df56c6

由 Herbert Xu 提交于 11月 18, 2007

Indeed my previous change to alloc_pskb has made it possible
for the TCP header to be misaligned iff the MTU is not a multiple
of 4 (and less than a page).  So I suspect the optimised IPsec
MTU calculation is giving you just such an MTU :)

This patch fixes it by changing alloc_pskb to make sure that
the size is at least 32-bit aligned.  This does not cause the
problem fixed by the previous patch because max_header is always
32-bit aligned which means that in the SG/NOTSO case this will
be a no-op.

I thought about putting this in the callers but all the current
callers are from TCP.  If and when we get a non-TCP caller we
can always create a TCP wrapper for this function and move the
alignment over there.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

21df56c6

15 11月, 2007 2 次提交

[INET]: Fix potential kfree on vmalloc-ed area of request_sock_queue · dab6ba36

由 Pavel Emelyanov 提交于 11月 15, 2007

The request_sock_queue's listen_opt is either vmalloc-ed or
kmalloc-ed depending on the number of table entries. Thus it 
is expected to be handled properly on free, which is done in 
the reqsk_queue_destroy().

However the error path in inet_csk_listen_start() calls 
the lite version of reqsk_queue_destroy, called 
__reqsk_queue_destroy, which calls the kfree unconditionally. 

Fix this and move the __reqsk_queue_destroy into a .c file as 
it looks too big to be inline.

As David also noticed, this is an error recovery path only,
so no locking is required and the lopt is known to be not NULL.

reqsk_queue_yank_listen_sk is also now only used in
net/core/request_sock.c so we should move it there too.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dab6ba36

[TCP]: Fix size calculation in sk_stream_alloc_pskb · fb93134d

由 Herbert Xu 提交于 11月 14, 2007

We round up the header size in sk_stream_alloc_pskb so that
TSO packets get zero tail room.  Unfortunately this rounding
up is not coordinated with the select_size() function used by
TCP to calculate the second parameter of sk_stream_alloc_pskb.

As a result, we may allocate more than a page of data in the
non-TSO case when exactly one page is desired.

In fact, rounding up the head room is detrimental in the non-TSO
case because it makes memory that would otherwise be available to
the payload head room.  TSO doesn't need this either, all it wants
is the guarantee that there is no tail room.

So this patch fixes this by adjusting the skb_reserve call so that
exactly the requested amount (which all callers have calculated in
a precise way) is made available as tail room.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb93134d

13 11月, 2007 3 次提交

[NET]: Move unneeded data to initdata section. · 022cbae6

由 Denis V. Lunev 提交于 11月 13, 2007

This patch reverts Eric's commit 2b008b0a

It diets .text & .data section of the kernel if CONFIG_NET_NS is not set.
This is safe after list operations cleanup.
Signed-of-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

022cbae6

[INET]: Use list_head-s in inetpeer.c · d71209de

由 Pavel Emelyanov 提交于 11月 12, 2007

The inetpeer.c tracks the LRU list of inet_perr-s, but makes
it by hands. Use the list_head-s for this.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d71209de

A
[INET]: Remove leftover prototypes from include/net/inet_common.h · c0d82487
由 Arnaldo Carvalho de Melo 提交于 11月 12, 2007
```
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
c0d82487

11 11月, 2007 6 次提交

[INET]: Small possible memory leak in FIB rules · 2994c638

由 Denis V. Lunev 提交于 11月 10, 2007

This patch fixes a small memory leak. Default fib rules can be deleted by
the user if the rule does not carry FIB_RULE_PERMANENT flag, f.e. by
	ip rule flush

Such a rule will not be freed as the ref-counter has 2 on start and becomes
clearly unreachable after removal.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2994c638

[AF_UNIX]: Make unix_tot_inflight counter non-atomic · 9305cfa4

由 Pavel Emelyanov 提交于 11月 10, 2007

This counter is _always_ modified under the unix_gc_lock spinlock, 
so its atomicity can be provided w/o additional efforts.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9305cfa4

mac80211: remove unused driver ops · 56db6c52

由 Johannes Berg 提交于 10月 30, 2007

The driver operations set_ieee8021x(), set_port_auth() and
set_privacy_invoked() are not used by any drivers, except
set_privacy_invoked() they aren't even used by mac80211.
Remove them at least until we need to support drivers with
mac80211 that require getting this information.
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Acked-by: NMichael Wu <flamingice@sourmilk.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

56db6c52

mac80211: allow driver to ask for a rate control algorithm · 830f9038

由 Johannes Berg 提交于 10月 28, 2007

This allows a driver to ask for a specific rate control algorithm.
The rate control algorithm asked for must be registered and be
available as a module or built-in.
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

830f9038

[NET]: Make helper to get dst entry and "use" it · 03f49f34

由 Pavel Emelyanov 提交于 11月 10, 2007

There are many places that get the dst entry, increase the
__use counter and set the "lastuse" time stamp.

Make a helper for this.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03f49f34

E
[INET]: Add a missing include <linux/vmalloc.h> to inet_hashtables.h · 9e4505c4
由 Eric Dumazet 提交于 11月 10, 2007
```
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
9e4505c4

10 11月, 2007 3 次提交

V
SCTP: Clean-up some defines for regressions tests. · fa7ff654
由 Vlad Yasevich 提交于 11月 09, 2007
```
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
```
fa7ff654

SCTP: Make sctp_verify_param return multiple indications. · 7ab90804

由 Vlad Yasevich 提交于 11月 09, 2007

SCTP-AUTH and future ADD-IP updates have a requirement to
do additional verification of parameters and an ability to
ABORT the association if verification fails.  So, introduce
additional return code so that we can clear signal a required
action.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

7ab90804

SCTP: Convert custom hash lists to use hlist. · d970dbf8

由 Vlad Yasevich 提交于 11月 09, 2007

Convert the custom hash list traversals to use hlist functions.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

d970dbf8

08 11月, 2007 4 次提交

SCTP: Allow ADD_IP to work with AUTH for backward compatibility. · 73d9c4fd

由 Vlad Yasevich 提交于 10月 24, 2007

This patch adds a tunable that will allow ADD_IP to work without
AUTH for backward compatibility. The default value is off since
the default value for ADD_IP is off as well. People who need
to use ADD-IP with older implementations take risks of connection
hijacking and should consider upgrading or turning this tunable on.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

73d9c4fd

V
SCTP: Correctly disable ADD-IP when AUTH is not supported. · 88799fe5
由 Vlad Yasevich 提交于 10月 24, 2007
```
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
```
88799fe5

SCTP: Update RCU handling during the ADD-IP case · 0ed90fb0

由 Vlad Yasevich 提交于 10月 24, 2007

After learning more about rcu, it looks like the ADD-IP hadling
doesn't need to call call_rcu_bh.  All the rcu critical sections
use rcu_read_lock, so using call_rcu_bh is wrong here.
Now, restore the local_bh_disable() code blocks and use normal
call_rcu() calls.  Also restore the missing return statement.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

0ed90fb0

SCTP: Fix difference cases of retransmit. · b6157d8e

由 Vlad Yasevich 提交于 10月 24, 2007

Commit d0ce9291 broke several retransmit
cases including fast retransmit.  The reason is that we should
only delay by rto while doing retranmists as a result of a timeout.
Retransmit as a result of path mtu discover, fast retransmit, or
other evernts that should trigger immidiate retransmissions got broken.

Also, since rto is doubled prior to marking of packets elegable for
retransmission, we never marked correct chunks anyway.

The fix is provide a reason for a given retransmission so that we
can mark chunks appropriately and to save the old rto value to do
comparisons against.

All regressions tests passed with this code.

Spotted by Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

b6157d8e

07 11月, 2007 5 次提交

[INET]: Remove per bucket rwlock in tcp/dccp ehash table. · 230140cf

由 Eric Dumazet 提交于 11月 07, 2007

As done two years ago on IP route cache table (commit
22c047cc) , we can avoid using one
lock per hash bucket for the huge TCP/DCCP hash tables.

On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for
litle performance differences. (we hit a different cache line for the
rwlock, but then the bucket cache line have a better sharing factor
among cpus, since we dirty it less often). For netstat or ss commands
that want a full scan of hash table, we perform fewer memory accesses.

Using a 'small' table of hashed rwlocks should be more than enough to
provide correct SMP concurrency between different buckets, without
using too much memory. Sizing of this table depends on
num_possible_cpus() and various CONFIG settings.

This patch provides some locking abstraction that may ease a future
work using a different model for TCP/DCCP table.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

230140cf

[IPVS]: Synchronize closing of Connections · efac5276

由 Rumen G. Bogdanovski 提交于 11月 07, 2007

This patch makes the master daemon to sync the connection when it is about
to close.  This makes the connections on the backup to close or timeout
according their state.  Before the sync was performed only if the
connection is in ESTABLISHED state which always made the connections to
timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch
([IPVS]: use proper timeout instead of fixed value) effectively did nothing
more than increasing this to 15 minutes (Established state timeout).  So
this patch makes use of proper timeout since it syncs the connections on
status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout).
However if the backup misses CLOSE hopefully it did not miss FIN_WAIT.
Otherwise we will just have to wait for the ESTABLISHED state timeout. As
it is without this patch.  This way the number of the hanging connections
on the backup is kept to minimum. And very few of them will be left to
timeout with a long timeout.

This is important if we want to make use of the fix for the real server
overcommit on master/backup fail-over.
Signed-off-by: NRumen G. Bogdanovski <rumen@voicecho.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

efac5276

[IPVS]: Bind connections on stanby if the destination exists · 1e356f9c

由 Rumen G. Bogdanovski 提交于 11月 07, 2007

This patch fixes the problem with node overload on director fail-over.
Given the scenario: 2 nodes each accepting 3 connections at a time and 2
directors, director failover occurs when the nodes are fully loaded (6
connections to the cluster) in this case the new director will assign
another 6 connections to the cluster, If the same real servers exist
there.

The problem turned to be in not binding the inherited connections to
the real servers (destinations) on the backup director. Therefore:
"ipvsadm -l" reports 0 connections:
root@test2:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  -> node473.local:5999           Route   1000   0          0
  -> node484.local:5999           Route   1000   0          0

while "ipvs -lnc" is right
root@test2:~# ipvsadm -lnc
IPVS connection entries
pro expire state       source             virtual            destination
TCP 14:56  ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999
192.168.0.51:5999
TCP 14:59  ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999
192.168.0.52:5999

So the patch I am sending fixes the problem by binding the received
connections to the appropriate service on the backup director, if it
exists, else the connection will be handled the old way. So if the
master and the backup directors are synchronized in terms of real
services there will be no problem with server over-committing since
new connections will not be created on the nonexistent real services
on the backup. However if the service is created later on the backup,
the binding will be performed when the next connection update is
received. With this patch the inherited connections will show as
inactive on the backup:

root@test2:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  -> node473.local:5999           Route   1000   0          1
  -> node484.local:5999           Route   1000   0          1

rumen@test2:~$ cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP  C0A800DE:176F wlc
  -> C0A80033:176F      Route   1000   0          1
  -> C0A80032:176F      Route   1000   0          1

Regards,
Rumen Bogdanovski
Acked-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NRumen G. Bogdanovski <rumen@voicecho.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

1e356f9c

[IPV4]: Compact some ifdefs in the fib code. · c3e9a353

由 Pavel Emelyanov 提交于 11月 06, 2007

There are places that check for CONFIG_IP_MULTIPLE_TABLES
twice in the same file, but the internals of these #ifdefs
can be merged.

As a side effect - remove one ifdef from inside a function.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3e9a353

[NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way. · 286ab3d4

由 Eric Dumazet 提交于 11月 05, 2007

"struct proto" currently uses an array stats[NR_CPUS] to track change on
'inuse' sockets per protocol.

If NR_CPUS is big, this means we use a big memory area for this.
Moreover, all this memory area is located on a single node on NUMA
machines, increasing memory pressure on the boot node.

In this patch, I tried to :

- Keep a fast !CONFIG_SMP implementation
- Keep a fast CONFIG_SMP implementation for often used protocols
(tcp,udp,raw,...)
- Introduce a NUMA efficient implementation

Some helper macros are defined in include/net/sock.h
These macros take into account CONFIG_SMP

If a "struct proto" is declared without using DEFINE_PROTO_INUSE /
REF_PROTO_INUSE
macros, it will automatically use a default implementation, using a
dynamically allocated percpu zone.
This default implementation will be NUMA efficient, but might use 32/64
bytes per possible cpu
because of current alloc_percpu() implementation.
However it still should be better than previous implementation based on
stats[NR_CPUS] field.

When a "struct proto" is changed to use the new macros, we use a single
static "int" percpu variable,
lowering the memory and cpu costs, still preserving NUMA efficiency.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

286ab3d4

02 11月, 2007 1 次提交

cleanup asm/scatterlist.h includes · 87ae9afd

由 Adrian Bunk 提交于 10月 30, 2007

Not architecture specific code should not #include <asm/scatterlist.h>.

This patch therefore either replaces them with
#include <linux/scatterlist.h> or simply removes them if they were
unused.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

87ae9afd

01 11月, 2007 1 次提交

[NET]: Relax the reference counting of init_net_ns · d4655795

由 Pavel Emelyanov 提交于 11月 01, 2007

When the CONFIG_NET_NS is n there's no need in refcounting
the initial net namespace. So relax this code by making a
stupid stubs for the "n" case.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4655795

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功