提交 · 1bf06cd2e338fd6fc29169d30eaf0df982338285 · openeuler / raspberrypi-kernel

29 1月, 2008 2 次提交

由 Herbert Xu 提交于 1月 11, 2008

Most callers of the LOCAL_OUT chain will set the IP packet length and
header checksum before doing so.  They also share the same output
function dst_output.

This patch creates a new function called ip_local_out which does all
of that and converts the appropriate users over to it.

Apart from removing duplicate code, it will also help in merging the
IPsec output path once the same thing is done for IPv6.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c439cb2e

[NET]: Convert init_timer into setup_timer · b24b8a24

由 Pavel Emelyanov 提交于 1月 23, 2008

Many-many code in the kernel initialized the timer->function
and  timer->data together with calling init_timer(timer). There
is already a helper for this. Use it for networking code.

The patch is HUGE, but makes the code 130 lines shorter
(98 insertions(+), 228 deletions(-)).
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b24b8a24

05 12月, 2007 2 次提交

[IPVS]: Fix sched registration race when checking for name collision. · 4ac63ad6

由 Pavel Emelyanov 提交于 12月 04, 2007

The register_ip_vs_scheduler() checks for the scheduler with the
same name under the read-locked __ip_vs_sched_lock, then drops,
takes it for writing and puts the scheduler in list.

This is racy, since we can have a race window between the lock
being re-locked for writing.

The fix is to search the scheduler with the given name right under
the write-locked __ip_vs_sched_lock.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ac63ad6

[IPVS]: Don't leak sysctl tables if the scheduler registration fails. · a014bc8f

由 Pavel Emelyanov 提交于 12月 04, 2007

In case we load lblc or lblcr module we can leak some sysctl
tables if the call to register_ip_vs_scheduler() fails.

I've looked at the register_ip_vs_scheduler() code and saw, that
the only reason to fail is the name collision, so I think that
with some 3rd party schedulers this becomes a relevant issue. No?
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a014bc8f

21 11月, 2007 1 次提交

[IPVS]: Fix compiler warning about unused register_ip_vs_protocol · d535a916

由 Pavel Emelyanov 提交于 11月 20, 2007

This is silly, but I have turned the CONFIG_IP_VS to m,
to check the compilation of one (recently sent) fix
and set all the CONFIG_IP_VS_PROTO_XXX options to n to
speed up the compilation.

In this configuration the compiler warns me about

  CC [M]  net/ipv4/ipvs/ip_vs_proto.o
net/ipv4/ipvs/ip_vs_proto.c:49: warning: 'register_ip_vs_protocol' defined but not used

Indeed. With no protocols selected there are no
calls to this function - all are compiled out with
ifdefs.

Maybe the best fix would be to surround this call with
ifdef-s or tune the Kconfig dependences, but I think that
marking this register function as __used is enough. No?
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d535a916

20 11月, 2007 4 次提交

[IPV4]: Add missing "space" · 464c4f18

由 Joe Perches 提交于 11月 19, 2007

Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

464c4f18

[IPVS]: Move remaining sysctl handlers over to CTL_UNNUMBERED · 9055fa1f

由 Simon Horman 提交于 11月 19, 2007

Switch the remaining IPVS sysctl entries over to to use CTL_UNNUMBERED,
I stronly doubt that anyone is using the sys_sysctl interface to
these variables.
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9055fa1f

[IPVS]: Fix sysctl warnings about missing strategy in schedulers · 9e103fa6

由 Simon Horman 提交于 11月 19, 2007

sysctl table check failed: /net/ipv4/vs/lblc_expiration .3.5.21.19 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/lblcr_expiration .3.5.21.20 Missing strategy

Switch these entried over to use CTL_UNNUMBERED as clearly
the sys_syscal portion wasn't working.

This is along the same lines as Christian Borntraeger's patch that fixes
up entries with no stratergy in net/ipv4/ipvs/ip_vs_ctl.c
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e103fa6

[IPVS]: Fix sysctl warnings about missing strategy · 611cd55b

由 Christian Borntraeger 提交于 11月 19, 2007

Running the latest git code I get the following messages during boot:
sysctl table check failed: /net/ipv4/vs/drop_entry .3.5.21.4 Missing strategy
[...]		  
sysctl table check failed: /net/ipv4/vs/drop_packet .3.5.21.5 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/secure_tcp .3.5.21.6 Missing strategy
[...]
sysctl table check failed: /net/ipv4/vs/sync_threshold .3.5.21.24 Missing strategy

I removed the binary sysctl handler for those messages and also removed
the definitions in ip_vs.h. The alternative would be to implement a 
proper strategy handler, but syscall sysctl is deprecated.

There are other sysctl definitions that are commented out or work with 
the default sysctl_data strategy. I did not touch these. 
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

611cd55b

13 11月, 2007 1 次提交

[IPVS]: Remove unused exports. · 22649d1a

由 Adrian Bunk 提交于 11月 12, 2007

This patch removes the following unused EXPORT_SYMBOL's:
- ip_vs_try_bind_dest
- ip_vs_find_dest
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22649d1a

07 11月, 2007 2 次提交

[IPVS]: Synchronize closing of Connections · efac5276

由 Rumen G. Bogdanovski 提交于 11月 07, 2007

This patch makes the master daemon to sync the connection when it is about
to close.  This makes the connections on the backup to close or timeout
according their state.  Before the sync was performed only if the
connection is in ESTABLISHED state which always made the connections to
timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch
([IPVS]: use proper timeout instead of fixed value) effectively did nothing
more than increasing this to 15 minutes (Established state timeout).  So
this patch makes use of proper timeout since it syncs the connections on
status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout).
However if the backup misses CLOSE hopefully it did not miss FIN_WAIT.
Otherwise we will just have to wait for the ESTABLISHED state timeout. As
it is without this patch.  This way the number of the hanging connections
on the backup is kept to minimum. And very few of them will be left to
timeout with a long timeout.

This is important if we want to make use of the fix for the real server
overcommit on master/backup fail-over.
Signed-off-by: NRumen G. Bogdanovski <rumen@voicecho.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

efac5276

[IPVS]: Bind connections on stanby if the destination exists · 1e356f9c

由 Rumen G. Bogdanovski 提交于 11月 07, 2007

This patch fixes the problem with node overload on director fail-over.
Given the scenario: 2 nodes each accepting 3 connections at a time and 2
directors, director failover occurs when the nodes are fully loaded (6
connections to the cluster) in this case the new director will assign
another 6 connections to the cluster, If the same real servers exist
there.

The problem turned to be in not binding the inherited connections to
the real servers (destinations) on the backup director. Therefore:
"ipvsadm -l" reports 0 connections:
root@test2:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  -> node473.local:5999           Route   1000   0          0
  -> node484.local:5999           Route   1000   0          0

while "ipvs -lnc" is right
root@test2:~# ipvsadm -lnc
IPVS connection entries
pro expire state       source             virtual            destination
TCP 14:56  ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999
192.168.0.51:5999
TCP 14:59  ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999
192.168.0.52:5999

So the patch I am sending fixes the problem by binding the received
connections to the appropriate service on the backup director, if it
exists, else the connection will be handled the old way. So if the
master and the backup directors are synchronized in terms of real
services there will be no problem with server over-committing since
new connections will not be created on the nonexistent real services
on the backup. However if the service is created later on the backup,
the binding will be performed when the next connection update is
received. With this patch the inherited connections will show as
inactive on the backup:

root@test2:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  -> node473.local:5999           Route   1000   0          1
  -> node484.local:5999           Route   1000   0          1

rumen@test2:~$ cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP  C0A800DE:176F wlc
  -> C0A80033:176F      Route   1000   0          1
  -> C0A80032:176F      Route   1000   0          1

Regards,
Rumen Bogdanovski
Acked-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NRumen G. Bogdanovski <rumen@voicecho.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

1e356f9c

31 10月, 2007 1 次提交

[IPVS]: Remove /proc/net/ip_vs_lblcr · 07afa040

由 Alexey Dobriyan 提交于 10月 30, 2007

It's under CONFIG_IP_VS_LBLCR_DEBUG option which never existed.
Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07afa040

30 10月, 2007 1 次提交

[IPVS]: use proper timeout instead of fixed value · 5c81833c

由 Andy Gospodarek 提交于 10月 29, 2007

Instead of using the default timeout of 3 minutes, this uses the timeout
specific to the protocol used for the connection. The 3 minute timeout
seems somewhat arbitrary (though I know it is used other places in the
ipvs code) and when failing over it would be much nicer to use one of
the configured timeout values.
Signed-off-by: NAndy Gospodarek <andy@greyhouse.net>
Acked-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c81833c

24 10月, 2007 1 次提交

[NET]: Treat the sign of the result of skb_headroom() consistently · c2636b4d

由 Chuck Lever 提交于 10月 23, 2007

In some places, the result of skb_headroom() is compared to an unsigned
integer, and in others, the result is compared to a signed integer.  Make
the comparisons consistent and correct.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2636b4d

20 10月, 2007 2 次提交

Use task_pid_nr() in ip_vs_sync.c · 6651fd56

由 Pavel Emelyanov 提交于 10月 18, 2007

The sync_master_pid and sync_backup_pid are set in set_sync_pid() and are
used later for set/not-set checks and in printk.  So it is safe to use the
global pid value in this case.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6651fd56

Use helpers to obtain task pid in printks · ba25f9dc

由 Pavel Emelyanov 提交于 10月 18, 2007

The task_struct->pid member is going to be deprecated, so start
using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
the kernel.

The first thing to start with is the pid, printed to dmesg - in
this case we may safely use task_pid_nr(). Besides, printks produce
more (much more) than a half of all the explicit pid usage.

[akpm@linux-foundation.org: git-drm went and changed lots of stuff]
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ba25f9dc

16 10月, 2007 3 次提交

[NETFILTER]: Replace sk_buff ** with sk_buff * · 3db05fea

由 Herbert Xu 提交于 10月 15, 2007

With all the users of the double pointers removed, this patch mops up by
finally replacing all occurances of sk_buff ** in the netfilter API by
sk_buff *.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3db05fea

[IPVS]: Replace local version of skb_make_writable · af1e1cf0

由 Herbert Xu 提交于 10月 14, 2007

This patch removes the IPVS-specific version of skb_make_writable and
replaces it with the netfilter one.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af1e1cf0

[IPV4]: Change ip_defrag to return an integer · 776c729e

由 Herbert Xu 提交于 10月 14, 2007

Now that ip_frag always returns the packet given to it on input, we can
change it to return an integer indicating error instead.  This patch does
that and updates all its callers accordingly.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

776c729e

11 10月, 2007 5 次提交

[NET]: Make core networking code use seq_open_private · cf7732e4

由 Pavel Emelyanov 提交于 10月 10, 2007

This concerns the ipv4 and ipv6 code mostly, but also the netlink
and unix sockets.

The netlink code is an example of how to use the __seq_open_private()
call - it saves the net namespace on this private.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf7732e4

[IPV4]: When possible test for IFF_LOOPBACK and not dev == loopback_dev · 0cc217e1

由 Eric W. Biederman 提交于 9月 26, 2007

Now that multiple loopback devices are becoming possible it makes
the code a little cleaner and more maintainable to test if a deivice
is th a loopback device by testing dev->flags & IFF_LOOPBACK instead
of dev == loopback_dev.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0cc217e1

[NET]: Dynamically allocate the loopback device, part 1. · de3cb747

由 Daniel Lezcano 提交于 9月 25, 2007

This patch replaces all occurences to the static variable
loopback_dev to a pointer loopback_dev. That provides the
mindless, trivial, uninteressting change part for the dynamic
allocation for the loopback.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Acked-By: NKirill Korotaev <dev@sw.ru>
Acked-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de3cb747

[NET]: Make the device list and device lookups per namespace. · 881d966b

由 Eric W. Biederman 提交于 9月 17, 2007

This patch makes most of the generic device layer network
namespace safe.  This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables.  The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl

were modified to take a network namespace argument, and
deal with it.

vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.

So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces.  The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace.  This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.

For now the ifindex generator is left global.

Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.

At the same time there are assumptions in the network stack
that the ifindex of a network device won't change.  Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

881d966b

[NET]: Make /proc/net per network namespace · 457c4cbc

由 Eric W. Biederman 提交于 9月 12, 2007

This patch makes /proc/net per network namespace. It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.

Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

457c4cbc

11 9月, 2007 1 次提交

[NETFILTER]: Fix/improve deadlock condition on module removal netfilter · 16fcec35

由 Neil Horman 提交于 9月 11, 2007

So I've had a deadlock reported to me.  I've found that the sequence of
events goes like this:

1) process A (modprobe) runs to remove ip_tables.ko

2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket,
increasing the ip_tables socket_ops use count

3) process A acquires a file lock on the file ip_tables.ko, calls remove_module
in the kernel, which in turn executes the ip_tables module cleanup routine,
which calls nf_unregister_sockopt

4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the
calling process into uninterruptible sleep, expecting the process using the
socket option code to wake it up when it exits the kernel

4) the user of the socket option code (process B) in do_ipt_get_ctl, calls
ipt_find_table_lock, which in this case calls request_module to load
ip_tables_nat.ko

5) request_module forks a copy of modprobe (process C) to load the module and
blocks until modprobe exits.

6) Process C. forked by request_module process the dependencies of
ip_tables_nat.ko, of which ip_tables.ko is one.

7) Process C attempts to lock the request module and all its dependencies, it
blocks when it attempts to lock ip_tables.ko (which was previously locked in
step 3)

Theres not really any great permanent solution to this that I can see, but I've
developed a two part solution that corrects the problem

Part 1) Modifies the nf_sockopt registration code so that, instead of using a
use counter internal to the nf_sockopt_ops structure, we instead use a pointer
to the registering modules owner to do module reference counting when nf_sockopt
calls a modules set/get routine.  This prevents the deadlock by preventing set 4
from happening.

Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking
remove operations (the same way rmmod does), and add an option to explicity
request blocking operation.  So if you select blocking operation in modprobe you
can still cause the above deadlock, but only if you explicity try (and since
root can do any old stupid thing it would like....  :)  ).
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16fcec35

14 8月, 2007 2 次提交

[IPVS]: Use IP_VS_WAIT_WHILE when encessary. · cae7ca3d

由 Heiko Carstens 提交于 8月 10, 2007

For architectures that don't have a volatile atomic_ts constructs like
while (atomic_read(&something)); might result in endless loops since a
barrier() is missing which forces the compiler to generate code that
actually reads memory contents.
Fix this in ipvs by using the IP_VS_WAIT_WHILE macro which resolves to
while (expr) { cpu_relax(); }
(why isn't this open coded btw?)
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cae7ca3d

[IPV4]: Clean up duplicate includes in net/ipv4/ · f49f9967

由 Jesper Juhl 提交于 8月 10, 2007

This patch cleans up duplicate includes in
	net/ipv4/
Signed-off-by: NJesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f49f9967

31 7月, 2007 1 次提交

[IPVS]: Use skb_forward_csum · ccc7911f

由 Herbert Xu 提交于 7月 30, 2007

As a path that forwards packets, IPVS should be using
skb_forward_csum instead of directly setting ip_summed
to CHECKSUM_NONE.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ccc7911f

20 7月, 2007 1 次提交

mm: Remove slab destructors from kmem_cache_create(). · 20c2df83

由 Paul Mundt 提交于 7月 20, 2007

Slab destructors were no longer supported after Christoph's
c59def9f change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

20c2df83

11 7月, 2007 1 次提交

[NET]: Make all initialized struct seq_operations const. · 56b3d975

由 Philippe De Muyter 提交于 7月 10, 2007

Make all initialized struct seq_operations in net/ const
Signed-off-by: NPhilippe De Muyter <phdm@macqel.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56b3d975

19 6月, 2007 1 次提交

[IPVS]: Fix state variable on failure to start ipvs threads · cc0191ae

由 Neil Horman 提交于 6月 18, 2007

ip_vs currently fails to reset its ip_vs_sync_state variable if the
sync thread fails to start properly.  The result is that the kernel
will report a running daemon when their actuall is none.

If you issue the following commands:

1. ipvsadm --start-daemon master --mcast-interface bla
2. ipvsadm -L --daemon
3. ipvsadm --stop-daemon master

Assuming that bla is not an actual interface, step 2 should return no
data, but instead returns:

$ ipvsadm -L --daemon
master sync daemon (mcast=bla, syncid=0)
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc0191ae

25 5月, 2007 1 次提交

[IPVS]: Use menuconfig objects. · a6938a1e

由 Jan Engelhardt 提交于 5月 23, 2007

Use menuconfigs instead of menus, so the whole menu can be disabled at once
instead of going through all options.
Signed-off-by: NJan Engelhardt <jengelh@gmx.de>
Acked-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6938a1e

10 5月, 2007 2 次提交

unify flush_work/flush_work_keventd and rename it to cancel_work_sync · 28e53bdd

由 Oleg Nesterov 提交于 5月 09, 2007

flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq
(this was possible from the very beginnig, I missed this).  So we can unify
flush_work_keventd and flush_work.

Also, rename flush_work() to cancel_work_sync() and fix all callers.
Perhaps this is not the best name, but "flush_work" is really bad.

(akpm: this is why the earlier patches bypassed maintainers)
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Auke Kok <auke-jan.h.kok@intel.com>,
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

28e53bdd

ipvs: flush defense_work before module unload · c214b2cc

由 Oleg Nesterov 提交于 5月 09, 2007

net/ipv4/ipvs/ip_vs_core.c

	module_exit
	    ip_vs_cleanup
		ip_vs_control_cleanup
		    cancel_rearming_delayed_work
	// done

This is unsafe.  The module may be unloaded and the memory may be freed
while defense_work's handler is still running/preempted.

Do flush_work(&defense_work.work) after cancel_rearming_delayed_work().

Alternatively, we could add flush_work() to cancel_rearming_delayed_work(),
but note that we can't change cancel_delayed_work() in the same manner
because it may be called from atomic context.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c214b2cc

09 5月, 2007 1 次提交

Fix occurrences of "the the " · 59c51591

由 Michael Opdenacker 提交于 5月 09, 2007

Signed-off-by: NMichael Opdenacker <michael@free-electrons.com>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

59c51591

26 4月, 2007 4 次提交

[NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY · 60476372

由 Herbert Xu 提交于 4月 09, 2007

When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
maps to the semantics of CHECKSUM_UNNECESSARY.  Therefore we should
treat it as such in the stack.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60476372

[SK_BUFF]: Introduce skb_copy_to_linear_data{_offset} · 27d7ff46

由 Arnaldo Carvalho de Melo 提交于 3月 31, 2007

To clearly state the intent of copying to linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>

27d7ff46

[SK_BUFF]: Convert skb->tail to sk_buff_data_t · 27a884dc

由 Arnaldo Carvalho de Melo 提交于 4月 19, 2007

So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)

Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27a884dc

[SK_BUFF]: Use offsets for skb->{mac,network,transport}_header on 64bit architectures · 2e07fa9c

由 Arnaldo Carvalho de Melo 提交于 4月 10, 2007

With this we save 8 bytes per network packet, leaving a 4 bytes hole to be used
in further shrinking work, likely with the offsetization of other pointers,
such as ->{data,tail,end}, at the cost of adds, that were minimized by the
usual practice of setting skb->{mac,nh,n}.raw to a local variable that is then
accessed multiple times in each function, it also is not more expensive than
before with regards to most of the handling of such headers, like setting one
of these headers to another (transport to network, etc), or subtracting, adding
to/from it, comparing them, etc.

Now we have this layout for sk_buff on a x86_64 machine:

[acme@mica net-2.6.22]$ pahole vmlinux sk_buff
struct sk_buff {
	struct sk_buff *       next;             /*   0   8 */
	struct sk_buff *       prev;             /*   8   8 */
	struct rb_node         rb;               /*  16  24 */
	struct sock *          sk;               /*  40   8 */
	ktime_t                tstamp;           /*  48   8 */
	struct net_device *    dev;              /*  56   8 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	struct net_device *    input_dev;        /*  64   8 */
	sk_buff_data_t         transport_header; /*  72   4 */
	sk_buff_data_t         network_header;   /*  76   4 */
	sk_buff_data_t         mac_header;       /*  80   4 */

	/* XXX 4 bytes hole, try to pack */

	struct dst_entry *     dst;              /*  88   8 */
	struct sec_path *      sp;               /*  96   8 */
	char                   cb[48];           /* 104  48 */
	/* cacheline 2 boundary (128 bytes) was 24 bytes ago*/
	unsigned int           len;              /* 152   4 */
	unsigned int           data_len;         /* 156   4 */
	unsigned int           mac_len;          /* 160   4 */
	union {
		__wsum         csum;             /*       4 */
		__u32          csum_offset;      /*       4 */
	};                                       /* 164   4 */
	__u32                  priority;         /* 168   4 */
	__u8                   local_df:1;       /* 172   1 */
	__u8                   cloned:1;         /* 172   1 */
	__u8                   ip_summed:2;      /* 172   1 */
	__u8                   nohdr:1;          /* 172   1 */
	__u8                   nfctinfo:3;       /* 172   1 */
	__u8                   pkt_type:3;       /* 173   1 */
	__u8                   fclone:2;         /* 173   1 */
	__u8                   ipvs_property:1;  /* 173   1 */

	/* XXX 2 bits hole, try to pack */

	__be16                 protocol;         /* 174   2 */
	void    (*destructor)(struct sk_buff *); /* 176   8 */
	struct nf_conntrack *  nfct;             /* 184   8 */
	/* --- cacheline 3 boundary (192 bytes) --- */
	struct sk_buff *       nfct_reasm;       /* 192   8 */
	struct nf_bridge_info *nf_bridge;        /* 200   8 */
	__u16                  tc_index;         /* 208   2 */
	__u16                  tc_verd;          /* 210   2 */
	dma_cookie_t           dma_cookie;       /* 212   4 */
	__u32                  secmark;          /* 216   4 */
	__u32                  mark;             /* 220   4 */
	unsigned int           truesize;         /* 224   4 */
	atomic_t               users;            /* 228   4 */
	unsigned char *        head;             /* 232   8 */
	unsigned char *        data;             /* 240   8 */
	unsigned char *        tail;             /* 248   8 */
	/* --- cacheline 4 boundary (256 bytes) --- */
	unsigned char *        end;              /* 256   8 */
}; /* size: 264, cachelines: 5 */
   /* sum members: 260, holes: 1, sum holes: 4 */
   /* bit holes: 1, sum bit holes: 2 bits */
   /* last cacheline: 8 bytes */

On 32 bits nothing changes, and pointers continue to be used with the compiler
turning all this abstraction layer into dust. But there are some sk_buff
validation tricks that are now possible, humm... :-)
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e07fa9c