提交 · cb7f6a7b716e801097b564dec3ccb58d330aef56 · openeuler / raspberrypi-kernel

07 10月, 2008 1 次提交

IPVS: Move IPVS to net/netfilter/ipvs · cb7f6a7b

由 Julius Volz 提交于 9月 19, 2008

Since IPVS now has partial IPv6 support, this patch moves IPVS from
net/ipv4/ipvs to net/netfilter/ipvs. It's a result of:

$ git mv net/ipv4/ipvs net/netfilter

and adapting the relevant Kconfigs/Makefiles to the new path.
Signed-off-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

cb7f6a7b

22 9月, 2008 2 次提交

ipvs: Fix unused label warning · 8d5803bf

由 Sven Wegener 提交于 9月 20, 2008

Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
Signed-off-by: NSimon Horman <horms@verge.net.au>

8d5803bf

ipvs: Restrict sync message to 255 connections · e6f225eb

由 Sven Wegener 提交于 9月 19, 2008

The nr_conns variable in the sync message header is only eight bits wide
and will overflow on interfaces with a large MTU. As a result the backup
won't parse all connections contained in the sync buffer. On regular
ethernet with an MTU of 1500 this isn't a problem, because we can't
overflow the value, but consider jumbo frames being used on a cross-over
connection between both directors.

We now restrict the size of the sync buffer, so that we never put more
than 255 connections into a single sync buffer.
Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
Signed-off-by: NSimon Horman <horms@verge.net.au>

e6f225eb

21 9月, 2008 17 次提交

tcp: advertise MSS requested by user · f5fff5dc

由 Tom Quetchenbach 提交于 9月 21, 2008

I'm trying to use the TCP_MAXSEG option to setsockopt() to set the MSS
for both sides of a bidirectional connection.

man tcp says: "If this option is set before connection establishment, it
also changes the MSS value announced to the other end in the initial
packet."

However, the kernel only uses the MTU/route cache to set the advertised
MSS. That means if I set the MSS to, say, 500 before calling connect(),
I will send at most 500-byte packets, but I will still receive 1500-byte
packets in reply.

This is a bug, either in the kernel or the documentation.

This patch (applies to latest net-2.6) reduces the advertised value to
that requested by the user as long as setsockopt() is called before
connect() or accept(). This seems like the behavior that one would
expect as well as that which is documented.

I've tried to make sure that things that depend on the advertised MSS
are set correctly.
Signed-off-by: NTom Quetchenbach <virtualphtn@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5fff5dc

A
net: Use hton[sl]() instead of __constant_hton[sl]() where applicable · 60678040
由 Arnaldo Carvalho de Melo 提交于 9月 20, 2008
```
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
60678040

tcp: back retransmit_high when it over-estimated · 618d9f25

由 Ilpo Järvinen 提交于 9月 20, 2008

If lost skb is sacked, we might have nothing to retransmit
as high as the retransmit_high is pointing to, so place
it lower to avoid unnecessary walking.

This is mainly for the case where high L'ed skbs gets sacked.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

618d9f25

tcp: don't clear lost_skb_hint when not necessary · 90638a04

由 Ilpo Järvinen 提交于 9月 20, 2008

Most importantly avoid doing it with cumulative ACK. However,
since we have lost_cnt_hint in the picture as well needing
adjustments, it's not as trivial as dealing with
retransmit_skb_hint (and cannot be done in the all place we
could trivially leave retransmit_skb_hint untouched).

With the previous patch, this should mostly remove O(n^2)
behavior while cumulative ACKs start flowing once rexmit
after a lossy round-trip made it through.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90638a04

tcp: don't clear retransmit_skb_hint when not necessary · ef9da47c

由 Ilpo Järvinen 提交于 9月 20, 2008

Most importantly avoid doing it with cumulative ACK. Not clearing
means that we no longer need n^2 processing in resolution of each
fast recovery.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef9da47c

tcp: remove retransmit_skb_hint clearing from failure · f0ceb0ed

由 Ilpo Järvinen 提交于 9月 20, 2008

This doesn't much sense here afaict, probably never has. Since
fragmenting and collapsing deal the hints by themselves, there
should be very little reason for the rexmit loop to do that.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f0ceb0ed

tcp: reorganize retransmit code loops · 0e1c54c2

由 Ilpo Järvinen 提交于 9月 20, 2008

Both loops are quite similar, so they can be combined
with little effort. As a result, forward_skb_hint becomes
obsolete as well.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e1c54c2

tcp: remove tp->lost_out guard to make joining diff nicer · 08ebd172

由 Ilpo Järvinen 提交于 9月 20, 2008

The validity of the retransmit_high must then be ensured
if no L'ed skb exits!

This makes a minor change to behavior, we now have to
iterate the head to find out that the loop terminates.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08ebd172

tcp: Reorganize skb tagbit checks · 61eb55f4

由 Ilpo Järvinen 提交于 9月 20, 2008

Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61eb55f4

tcp: remove obsolete validity concern · 34638570

由 Ilpo Järvinen 提交于 9月 20, 2008

Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34638570

tcp: add tcp_can_forward_retransmit · b5afe7bc

由 Ilpo Järvinen 提交于 9月 20, 2008

Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5afe7bc

tcp: No need to clear retransmit_skb_hint when SACKing · 184d68b2

由 Ilpo Järvinen 提交于 9月 20, 2008

Because lost counter no longer requires tuning, this is
trivial to remove (the tuning wouldn't have been too
hard either) because no "new" retransmittable skb appeared
below retransmit_skb_hint when SACKing for sure.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

184d68b2

tcp: Kill precaution that's very likely obsolete · f09142ed

由 Ilpo Järvinen 提交于 9月 20, 2008

I suspect it might have been related to the changed amount
of lost skbs, which was counted by retransmit_cnt_hint that
got changed.

The place for this clearing was very illogical anyway,
it should have been after the LOST-bit clearing loop to
make any sense.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f09142ed

tcp: convert retransmit_cnt_hint to seqno · 006f582c

由 Ilpo Järvinen 提交于 9月 20, 2008

Main benefit in this is that we can then freely point
the retransmit_skb_hint to anywhere we want to because
there's no longer need to know what would be the count
changes involve, and since this is really used only as a
terminator, unnecessary work is one time walk at most,
and if some retransmissions are necessary after that
point later on, the walk is not full waste of time
anyway.

Since retransmit_high must be kept valid, all lost
markers must ensure that.

Now I also have learned how those "holes" in the
rexmittable skbs can appear, mtu probe does them. So
I removed the misleading comment as well.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

006f582c

tcp: add helper for lost bit toggling · 41ea36e3

由 Ilpo Järvinen 提交于 9月 20, 2008

This useful because we'd need to verifying soon in many places
which makes things slightly more complex than it used to be.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41ea36e3

tcp: move tcp_verify_retransmit_hint · c8c213f2

由 Ilpo Järvinen 提交于 9月 20, 2008

Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8c213f2

tcp: Partial hint clearing has again become meaningless · 64edc273

由 Ilpo Järvinen 提交于 9月 20, 2008

Ie., the difference between partial and all clearing doesn't
exists anymore since the SACK optimizations got dropped by
an sacktag rewrite.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64edc273

17 9月, 2008 4 次提交

ipvs: change some __constant_htons() to htons() · d286600e

由 Brian Haley 提交于 9月 16, 2008

Change __contant_htons() to htons() in the IPVS code when not in an
initializer.

-Brian
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Acked-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

d286600e

ipvs: add __aquire/__release annotations to ip_vs_info_seq_start/ip_vs_info_seq_stop · 563e94f0

由 Simon Horman 提交于 9月 17, 2008

This teaches sparse that the following are not problems:

make C=1
  CHECK   net/ipv4/ipvs/ip_vs_ctl.c
net/ipv4/ipvs/ip_vs_ctl.c:1793:14: warning: context imbalance in 'ip_vs_info_seq_start' - wrong count at exit
net/ipv4/ipvs/ip_vs_ctl.c:1842:13: warning: context imbalance in 'ip_vs_info_seq_stop' - unexpected unlock
Acked-by: NSven Wegener <sven.wegener@stealer.net>
Acked-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

563e94f0

ipvs: supply a valid 0 address to ip_vs_conn_new() · dff630dd

由 Simon Horman 提交于 9月 17, 2008

ip_vs_conn_new expects a union nf_inet_addr as the type for its address
parameters, not a plain integer.

This problem was detected by sparse.

make C=1
  CHECK   net/ipv4/ipvs/ip_vs_core.c
net/ipv4/ipvs/ip_vs_core.c:469:9: warning: Using plain integer as NULL pointer
Acked-by: NSven Wegener <sven.wegener@stealer.net>
Acked-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

dff630dd

ipvs: only unlock in ip_vs_edit_service() if already locked · 9e691ed6

由 Simon Horman 提交于 9月 17, 2008

Jumping to out unlocks __ip_vs_svc_lock, but that lock is not taken until
after code that may jump to out.

This problem was detected by sparse.

make C=1
  CHECK   net/ipv4/ipvs/ip_vs_ctl.c
net/ipv4/ipvs/ip_vs_ctl.c:1332:2: warning: context imbalance in 'ip_vs_edit_service' - unexpected unlock
Acked-by: NSven Wegener <sven.wegener@stealer.net>
Acked-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

9e691ed6

13 9月, 2008 1 次提交

net: ip_vs_proto_{tcp,udp} build fix · 63f2c046

由 Stephen Rothwell 提交于 9月 12, 2008

Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63f2c046

09 9月, 2008 6 次提交

This reverts "Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/dccp_exp " · 410e27a4

由 Gerrit Renker 提交于 9月 09, 2008

as it accentally contained the wrong set of patches. These will be
submitted separately.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

410e27a4

ipvs: Embed user stats structure into kernel stats structure · e9c0ce23

由 Sven Wegener 提交于 9月 08, 2008

Instead of duplicating the fields, integrate a user stats structure into
the kernel stats structure. This is more robust when the members are
changed, because they are now automatically kept in sync.
Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
Reviewed-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

e9c0ce23

ipvs: Restrict connection table size via Kconfig · 2206a3f5

由 Sven Wegener 提交于 9月 08, 2008

Instead of checking the value in include/net/ip_vs.h, we can just
restrict the range in our Kconfig file. This will prevent values outside
of the range early.
Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
Reviewed-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

2206a3f5

IPVS: Remove incorrect ip_route_me_harder(), fix IPv6 · 9d7f2a2b

由 Julius Volz 提交于 9月 08, 2008

Remove an incorrect ip_route_me_harder() that was probably a result of
merging my IPv6 patches with the local client patches. With this, IPv6+NAT
are working again.
Signed-off-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

9d7f2a2b

ipvs: handle PARTIAL_CHECKSUM · 503e81f6

由 Simon Horman 提交于 9月 08, 2008

Now that LVS can load balance locally generated traffic, packets may come
from the loopback device and thus may have a partial checksum.

The existing code allows for the case where there is no checksum at all for
TCP, however Herbert Xu has confirmed that this is not legal.
Signed-off-by: NSimon Horman <horms@verge.net.au>
Acked-by: NJulius Volz <juliusv@google.com>

503e81f6

netns : fix kernel panic in timewait socket destruction · d315492b

由 Daniel Lezcano 提交于 9月 08, 2008

How to reproduce ?
 - create a network namespace
 - use tcp protocol and get timewait socket
 - exit the network namespace
 - after a moment (when the timewait socket is destroyed), the kernel
   panics.

# BUG: unable to handle kernel NULL pointer dereference at
0000000000000007
IP: [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8
PGD 119985067 PUD 11c5c0067 PMD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: ipv6 button battery ac loop dm_mod tg3 libphy ext3 jbd
edd fan thermal processor thermal_sys sg sata_svw libata dock serverworks
sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table]
Pid: 0, comm: swapper Not tainted 2.6.27-rc2 #3
RIP: 0010:[<ffffffff821e394d>] [<ffffffff821e394d>]
inet_twdr_do_twkill_work+0x6e/0xb8
RSP: 0018:ffff88011ff7fed0 EFLAGS: 00010246
RAX: ffffffffffffffff RBX: ffffffff82339420 RCX: ffff88011ff7ff30
RDX: 0000000000000001 RSI: ffff88011a4d03c0 RDI: ffff88011ac2fc00
RBP: ffffffff823392e0 R08: 0000000000000000 R09: ffff88002802a200
R10: ffff8800a5c4b000 R11: ffffffff823e4080 R12: ffff88011ac2fc00
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
FS: 0000000041cbd940(0000) GS:ffff8800bff839c0(0000)
knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000007 CR3: 00000000bd87c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8800bff9e000, task
ffff88011ff76690)
Stack: ffffffff823392e0 0000000000000100 ffffffff821e3a3a
0000000000000008
0000000000000000 ffffffff821e3a61 ffff8800bff7c000 ffffffff8203c7e7
ffff88011ff7ff10 ffff88011ff7ff10 0000000000000021 ffffffff82351108
Call Trace:
<IRQ> [<ffffffff821e3a3a>] ? inet_twdr_hangman+0x0/0x9e
[<ffffffff821e3a61>] ? inet_twdr_hangman+0x27/0x9e
[<ffffffff8203c7e7>] ? run_timer_softirq+0x12c/0x193
[<ffffffff820390d1>] ? __do_softirq+0x5e/0xcd
[<ffffffff8200d08c>] ? call_softirq+0x1c/0x28
[<ffffffff8200e611>] ? do_softirq+0x2c/0x68
[<ffffffff8201a055>] ? smp_apic_timer_interrupt+0x8e/0xa9
[<ffffffff8200cad6>] ? apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff82011f4c>] ? default_idle+0x27/0x3b
[<ffffffff8200abbd>] ? cpu_idle+0x5f/0x7d


Code: e8 01 00 00 4c 89 e7 41 ff c5 e8 8d fd ff ff 49 8b 44 24 38 4c 89 e7
65 8b 14 25 24 00 00 00 89 d2 48 8b 80 e8 00 00 00 48 f7 d0 <48> 8b 04 d0
48 ff 40 58 e8 fc fc ff ff 48 89 df e8 c0 5f 04 00
RIP [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8
RSP <ffff88011ff7fed0>
CR2: 0000000000000007

This patch provides a function to purge all timewait sockets related
to a network namespace. The timewait sockets life cycle is not tied with
the network namespace, that means the timewait sockets stay alive while
the network namespace dies. The timewait sockets are for avoiding to
receive a duplicate packet from the network, if the network namespace is
freed, the network stack is removed, so no chance to receive any packets
from the outside world. Furthermore, having a pending destruction timer
on these sockets with a network namespace freed is not safe and will lead
to an oops if the timer callback which try to access data belonging to 
the namespace like for example in:
	inet_twdr_do_twkill_work
		-> NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITED);

Purging the timewait sockets at the network namespace destruction will:
 1) speed up memory freeing for the namespace
 2) fix kernel panic on asynchronous timewait destruction
Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d315492b

08 9月, 2008 6 次提交

IPVS: use ipv6_addr_copy() · 178f5e49

由 Simon Horman 提交于 9月 08, 2008

It is standard to use ipv6_addr_copy() to fill in
the in6 element of a union nf_inet_addr snet.

Thanks to Julius Volz for pointing this out.

Cc: Brian Haley <brian.haley@hp.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>
Acked-by: NJulius Volz <juliusv@google.com>

178f5e49

IPVS: fix bogus indentation · 5af149cc

由 Simon Horman 提交于 9月 08, 2008

Sorry, this was my error.
Thanks to Julius Volz for pointing it out.
Signed-off-by: NSimon Horman <horms@verge.net.au>
Acked-by: NJulius Volz <juliusv@google.com>

5af149cc

ipvs: Reject ipv6 link-local addresses for destinations · 3bfb92f4

由 Sven Wegener 提交于 9月 05, 2008

We can't use non-local link-local addresses for destinations, without
knowing the interface on which we can reach the address. Reject them for
now.
Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
Acked-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

3bfb92f4

ipvs: Mark tcp/udp v4 and v6 debug functions static · 77eb8516

由 Sven Wegener 提交于 9月 05, 2008

They are only used in this file, so they should be static
Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
Acked-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

77eb8516

ipvs: Return negative error values from ip_vs_edit_service() · a5ba4bf2

由 Sven Wegener 提交于 9月 05, 2008

Like the other code in this function does.
Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
Acked-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

a5ba4bf2

ipvs: Use pointer to address from sync message · cd9fe6c4

由 Sven Wegener 提交于 9月 05, 2008

We want a pointer to it, not the value casted to a pointer.
Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
Acked-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

cd9fe6c4

05 9月, 2008 3 次提交

ipvs: load balance ipv6 connections from a local process · f2428ed5

由 Simon Horman 提交于 9月 05, 2008

This allows IPVS to load balance IPv6 connections made by a local process.
For example a proxy server running locally.

External client --> pound:443 -> Local:443 --> IPVS:80 --> RealServer

This is an extenstion to the IPv4 work done in this area
by Siim Põder and Malcolm Turnbull.

Cc: Siim Põder <siim@p6drad-teel.net>
Cc: Malcolm Turnbull <malcolm@loadbalancer.org>
Signed-off-by: NSimon Horman <horms@verge.net.au>

f2428ed5

ipvs: load balance IPv4 connections from a local process · 4856c84c

由 Malcolm Turnbull 提交于 9月 05, 2008

This allows IPVS to load balance connections made by a local process.
For example a proxy server running locally.

External client --> pound:443 -> Local:443 --> IPVS:80 --> RealServer
Signed-off-by: NSiim Põder <siim@p6drad-teel.net>
Signed-off-by: NMalcolm Turnbull <malcolm@loadbalancer.org>
Signed-off-by: NSimon Horman <horms@verge.net.au>

4856c84c

IPVS: Allow adding IPv6 services from userspace · f94fd041

由 Julius Volz 提交于 9月 02, 2008

Allow adding IPv6 services through the genetlink interface and add checks
to see if the chosen scheduler is supported with IPv6 and whether the
supplied prefix length is sane. Make sure the service count exported via
the sockopt interface only counts IPv4 services.
Signed-off-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

f94fd041