提交 · 22712813620fa8e682dbfb253a60ca0131da1e07 · openeuler / Kernel

04 1月, 2006 16 次提交

[INET]: Generalise tcp_v4_hash_connect · a7f5e7f1

由 Arnaldo Carvalho de Melo 提交于 12月 13, 2005

Renaming it to inet_hash_connect, making it possible to ditch
dccp_v4_hash_connect and share the same code with TCP instead.
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7f5e7f1

[TWSK]: Introduce struct timewait_sock_ops · 6d6ee43e

由 Arnaldo Carvalho de Melo 提交于 12月 13, 2005

So that we can share several timewait sockets related functions and
make the timewait mini sockets infrastructure closer to the request
mini sockets one.

Next changesets will take advantage of this, moving more code out of
TCP and DCCP v4 and v6 to common infrastructure.
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d6ee43e

[IPV6]: Introduce inet6_timewait_sock · 0fa1a53e

由 Arnaldo Carvalho de Melo 提交于 12月 13, 2005

Out of tcp6_timewait_sock, that now is just an aggregation of
inet_timewait_sock and inet6_timewait_sock, using tw_ipv6_offset in struct
inet_timewait_sock, that is common to the IPv6 transport protocols that use
timewait sockets, like DCCP and TCP.

tw_ipv6_offset plays the struct inet_sock pinfo6 role, i.e. for the generic
code to find the IPv6 area in a timewait sock.
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0fa1a53e

[IPVS]: remove dead code · f1f71e03

由 Roberto Nibali 提交于 12月 13, 2005

This patch removes dead code. I don't see the reason to keep this cruft
around, besides cluttering the nice and functionally working code.
Signed-off-by: NRoberto Nibali <ratz@drugphish.ch>
Signed-off-by: NHorms <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1f71e03

[UDP]: udp_checksum_init return value · 65a45441

由 Stephen Hemminger 提交于 12月 13, 2005

Since udp_checksum_init always returns 0 there is no point in
having it return a value.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65a45441

[IP]: Simplify and consolidate MSG_PEEK error handling · 3305b80c

由 Herbert Xu 提交于 12月 13, 2005

When a packet is obtained from skb_recv_datagram with MSG_PEEK enabled
it is left on the socket receive queue.  This means that when we detect
a checksum error we have to be careful when trying to free the packet
as someone could have dequeued it in the time being.

Currently this delicate logic is duplicated three times between UDPv4,
UDPv6 and RAWv6.  This patch moves them into a one place and simplifies
the code somewhat.

This is based on a suggestion by Eric Dumazet.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3305b80c

[ICSK]: Move v4_addr2sockaddr from TCP to icsk · af05dc93

由 Arnaldo Carvalho de Melo 提交于 12月 13, 2005

Renaming it to inet_csk_addr2sockaddr.
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af05dc93

[ICSK]: Rename struct tcp_func to struct inet_connection_sock_af_ops · 8292a17a

由 Arnaldo Carvalho de Melo 提交于 12月 13, 2005

And move it to struct inet_connection_sock. DCCP will use it in the
upcoming changesets.
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8292a17a

[IPV6]: Introduce inet6_rsk() · ca304b61

由 Arnaldo Carvalho de Melo 提交于 12月 13, 2005

And inet6_rsk_offset in inet_request_sock, for the same reasons as
inet_sock's pinfo6 member.
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca304b61

A
[ICSK]: make inet_csk_reqsk_queue_hash_add timeout arg unsigned long · c2977c22
由 Arnaldo Carvalho de Melo 提交于 12月 13, 2005
```
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
c2977c22
A
[IPV6]: Reuse inet_csk_get_port in tcp_v6_get_port · 971af18b
由 Arnaldo Carvalho de Melo 提交于 12月 13, 2005
```
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
971af18b

[IPV4]: Safer reassembly · 89cee8b1

由 Herbert Xu 提交于 12月 13, 2005

Another spin of Herbert Xu's "safer ip reassembly" patch
for 2.6.16.

(The original patch is here:
http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2
and my only contribution is to have tested it.)

This patch (optionally) does additional checks before accepting IP
fragments, which can greatly reduce the possibility of reassembling
fragments which originated from different IP datagrams.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NArthur Kepner <akepner@sgi.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89cee8b1

[NETFILTER] ip_tables: NUMA-aware allocation · 31836064

由 Eric Dumazet 提交于 12月 13, 2005

Part of a performance problem with ip_tables is that memory allocation
is not NUMA aware, but 'only' SMP aware (ie each CPU normally touch
separate cache lines)

Even with small iptables rules, the cost of this misplacement can be
high on common workloads.  Instead of using one vmalloc() area
(located in the node of the iptables process), we now allocate an area
for each possible CPU, using vmalloc_node() so that memory should be
allocated in the CPU's node if possible.

Port to arp_tables and ip6_tables by Harald Welte.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31836064

[TCP] BIC: CUBIC window growth (2.0) · df3271f3

由 Stephen Hemminger 提交于 12月 13, 2005

Replace existing BIC version 1.1 with new version 2.0.
The main change is to replace the window growth function
with a cubic function as described in:
http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/cubic-paper.pdfSigned-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df3271f3

[TCP] BIC: spelling and whitespace · 05d05450

由 Stephen Hemminger 提交于 12月 13, 2005

Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05d05450

[TCP] BIC: remove low utilization code. · 018da8f4

由 Stephen Hemminger 提交于 12月 13, 2005

The latest BICTCP patch at:
http://www.csc.ncsu.edu:8080/faculty/rhee/export/bitcp/index_files/Page546.htm

disables the low_utilization feature of BICTCP because it doesn't work
in some cases. This patch removes it.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

018da8f4

20 12月, 2005 2 次提交

[XFRM]: Handle DCCP in xfrm{4,6}_decode_session · 9e999993

由 Patrick McHardy 提交于 12月 19, 2005

Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e999993

[NETFILTER]: Fix NAT init order · 0476f171

由 Patrick McHardy 提交于 12月 19, 2005

As noticed by Phil Oester, the GRE NAT protocol helper is initialized
before the NAT core, which makes registration fail.

Change the linking order to make NAT be initialized first.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0476f171

15 12月, 2005 1 次提交

[GRE]: Fix hardware checksum modification · 1542272a

由 Herbert Xu 提交于 12月 14, 2005

The skb_postpull_rcsum introduced a bug to the checksum modification.
Although the length pulled is offset bytes, the origin of the pulling
is the GRE header, not the IP header.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1542272a

13 12月, 2005 1 次提交

[NETFILTER]: ip_nat_tftp: Fix expectation NAT · 2f9616d4

由 Marcus Sundberg 提交于 12月 12, 2005

When a TFTP client is SNATed so that the port is also changed, the
port is never changed back for the expected connection.
Signed-off-by: NMarcus Sundberg <marcus@ingate.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f9616d4

07 12月, 2005 3 次提交

[TCP] Vegas: timestamp before clone · dfb4b9dc

由 David S. Miller 提交于 12月 06, 2005

We have to store the congestion control timestamp on the SKB before we
clone it, not after.  Else we get no timestamping information at all.

tcp_transmit_skb() has been reworked so that we can do the timestamp
still in one spot, instead of at all the call sites.

Problem discovered, and initial fix, from Tom Young
<tyo@ee.unimelb.edu.au>.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dfb4b9dc

[TCP] Vegas: Remove extra call to tcp_vegas_rtt_calc · 0d7bef60

由 Thomas Young 提交于 12月 06, 2005

Remove unneeded call to tcp_vegas_rtt_calc. The more accurate
microsecond value has already been registered prior to calling
tcp_vegas_cong_avoid.
Signed-off-by: NThomas Young <tyo@ee.mu.oz.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d7bef60

[TCP] Vegas: stop resetting rtt every ack · 5b495613

由 Thomas Young 提交于 12月 06, 2005

Move the resetting of rtt measurements to inside the once per RTT
block of code.
Signed-off-by: NThomas Young <tyo@ee.mu.oz.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b495613

06 12月, 2005 6 次提交

P
[NETFILTER]: Don't use conntrack entry after dropping the reference · 2fdf1faa
由 Patrick McHardy 提交于 12月 05, 2005
```
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
2fdf1faa

[NETFILTER]: Fix unbalanced read_unlock_bh in ctnetlink · 266c8543

由 Patrick McHardy 提交于 12月 05, 2005

NFA_NEST calls NFA_PUT which jumps to nfattr_failure if the skb has no
room left. We call read_unlock_bh at nfattr_failure for the NFA_PUT inside
the locked section, so move NFA_NEST inside the locked section too.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

266c8543

[NETFILTER]: Mark ctnetlink as EXPERIMENTAL · a7957563

由 Patrick McHardy 提交于 12月 05, 2005

Should have been marked EXPERIMENTAL from the beginning, as the current
bunch of fixes show.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7957563

[NETFILTER]: Fix CTA_PROTO_NUM attribute size in ctnetlink · 0be7fa92

由 Patrick McHardy 提交于 12月 05, 2005

CTA_PROTO_NUM is a u_int8_t.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0be7fa92

[NETFILTER]: Fix ip_conntrack_flush abuse in ctnetlink · afe5c6bb

由 Patrick McHardy 提交于 12月 05, 2005

ip_conntrack_flush() used to be part of ip_conntrack_cleanup(), which needs
to drop _all_ references on module unload. Table flushed using ctnetlink
just needs to clean the table and doesn't need to flush the event cache or
wait for any references attached to skbs. Move everything but pure table
flushing back to ip_conntrack_cleanup().
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afe5c6bb

[NETFILTER]: Fix incorrect argument to ip_nat_initialized() in ctnetlink · 8d1ca699

由 Pablo Neira Ayuso 提交于 12月 05, 2005

ip_nat_initialized() takes enum ip_nat_manip_type as it's second argument,
not a hook number.

Noticed and initial patch by Marcus Sundberg <marcus@ingate.com>.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d1ca699

03 12月, 2005 2 次提交

[IPV4] Fix EPROTONOSUPPORT error in inet_create · 86c8f9d1

由 Herbert Xu 提交于 12月 02, 2005

There is a coding error in inet_create that causes it to always return
ESOCKTNOSUPPORT.  It should return EPROTONOSUPPORT when there are
protocols registered for a given socket type but none of them match
the requested protocol.

This is based on a patch by Jayachandran C.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86c8f9d1

[IGMP]: workaround for IGMP v1/v2 bug · 24c69275

由 David Stevens 提交于 12月 02, 2005

From: David Stevens <dlstevens@us.ibm.com>

As explained at:

	http://www.cs.ucsb.edu/~krishna/igmp_dos/

With IGMP version 1 and 2 it is possible to inject a unicast
report to a client which will make it ignore multicast
reports sent later by the router.

The fix is to only accept the report if is was sent to a
multicast or unicast address.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24c69275

02 12月, 2005 3 次提交

[NETLINK]: Fix processing of fib_lookup netlink messages · ea86575e

由 Thomas Graf 提交于 12月 01, 2005

The receive path for fib_lookup netlink messages is lacking sanity
checks for header and payload and is thus vulnerable to malformed
netlink messages causing illegal memory references.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea86575e

[NETFILTER]: Fix recent match jiffies wrap mismatches · 2a43c4af

由 Phil Oester 提交于 12月 01, 2005

Around jiffies wrap time (i.e. within first 5 mins after boot), recent
match rules which contain both --seconds and --hitcount arguments
experience false matches.

This is because the last_pkts array is filled with zeros on creation, and
when comparing 'now' to 0 (+ --seconds argument), time_before_eq thinks it
has found a hit.

Below patch adds a break if the packet value is zero.  This has the
unfortunate side effect of causing mismatches if a packet was received
when jiffies really was equal to zero.  The odds of that happening are
slim compared to the problems caused by not adding the break however.
Plus, the author used this same method just below, so it is "good enough".

This fixes netfilter bugs #383 and #395.
Signed-off-by: NPhil Oester <kernel@linuxace.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2a43c4af

[NETFILTER]: Ignore ACKs ACKs on half open connections in TCP conntrack · 73f30602

由 Jozsef Kadlecsik 提交于 12月 01, 2005

Mounting NFS file systems after a (warm) reboot could take a long time if
firewalling and connection tracking was enabled.

The reason is that the NFS clients tends to use the same ports (800 and
counting down). Now on reboot, the server would still have a TCB for an
existing TCP connection client:800 -> server:2049. The client sends a
SYN from port 800 to server:2049, which elicits an ACK from the server.
The firewall on the client drops the ACK because (from its point of
view) the connection is still in half-open state, and it expects to see
a SYNACK.

The client will eventually time out after several minutes.

The following patch corrects this, by accepting ACKs on half open
connections as well.
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73f30602

30 11月, 2005 4 次提交

[NETFILTER] ipv4: small cleanups · d127e94a

由 Adrian Bunk 提交于 11月 29, 2005

This patch contains the following cleanups:
- make needlessly global code static
- ip_conntrack_core.c: ip_conntrack_flush() -> ip_conntrack_flush(void)
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d127e94a

[IPV4]: make two functions static · 4b30b1c6

由 Adrian Bunk 提交于 11月 29, 2005

This patch makes two needlessly global functions static.
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b30b1c6

[NET]: Add const markers to various variables. · 9b5b5cff

由 Arjan van de Ven 提交于 11月 29, 2005

the patch below marks various variables const in net/; the goal is to
move them to the .rodata section so that they can't false-share
cachelines with things that get written to, as well as potentially
helping gcc a bit with optimisations.  (these were found using a gcc
patch to warn about such variables)
Signed-off-by: NArjan van de Ven <arjan@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b5b5cff

[IPV4] tcp/route: Another look at hash table sizes · 18955cfc

由 Mike Stroyan 提交于 11月 29, 2005

  The tcp_ehash hash table gets too big on systems with really big memory.
It is worse on systems with pages larger than 4KB.  It wastes memory that
could be better used.  It also makes the netstat command slow because reading
/proc/net/tcp and /proc/net/tcp6 needs to go through the full hash table.

  The default value should not be larger for larger page sizes.  It seems
that the effect of page size is an unintended error dating back a long
time.  I also wonder if the default value really should be a larger
fraction of memory for systems with more memory.  While systems with
really big ram can afford more space for hash tables, it is not clear to
me that they benefit from increasing the allocation ratio for this table.

  The amount of memory allocated is determined by net/ipv4/tcp.c:tcp_init and
mm/page_alloc.c:alloc_large_system_hash.

tcp_init calls alloc_large_system_hash passing parameters-
    bucketsize=sizeof(struct tcp_ehash_bucket)
    numentries=thash_entries
    scale=(num_physpages >= 128 * 1024) ? (25-PAGE_SHIFT) : (27-PAGE_SHIFT)
    limit=0

On i386, PAGE_SHIFT is 12 for a page size of 4K
On ia64, PAGE_SHIFT defaults to 14 for a page size of 16K

The num_physpages test above makes the allocation take a larger fraction
of the total memory on systems with larger memory.  The threshold size
for a i386 system is 512MB.  For an ia64 system with 16KB pages the
threshold is 2GB.

For smaller memory systems-
On i386, scale = (27 - 12) = 15
On ia64, scale = (27 - 14) = 13
For larger memory systems-
On i386, scale = (25 - 12) = 13
On ia64, scale = (25 - 14) = 11

  For the rest of this discussion, I'll just track the larger memory case.

  The default behavior has numentries=thash_entries=0, so the allocated
size is determined by either scale or by the default limit of 1/16 of
total memory.

In alloc_large_system_hash-
|	numentries = (flags & HASH_HIGHMEM) ? nr_all_pages : nr_kernel_pages;
|	numentries += (1UL << (20 - PAGE_SHIFT)) - 1;
|	numentries >>= 20 - PAGE_SHIFT;
|	numentries <<= 20 - PAGE_SHIFT;

  At this point, numentries is pages for all of memory, rounded up to the
nearest megabyte boundary.

|	/* limit to 1 bucket per 2^scale bytes of low memory */
|	if (scale > PAGE_SHIFT)
|		numentries >>= (scale - PAGE_SHIFT);
|	else
|		numentries <<= (PAGE_SHIFT - scale);

On i386, numentries >>= (13 - 12), so numentries is 1/8196 of
bytes of total memory.
On ia64, numentries <<= (14 - 11), so numentries is 1/2048 of
bytes of total memory.

|        log2qty = long_log2(numentries);
|
|        do {
|                size = bucketsize << log2qty;

bucketsize is 16, so size is 16 times numentries, rounded
down to a power of two.

On i386, size is 1/512 of bytes of total memory.
On ia64, size is 1/128 of bytes of total memory.

For smaller systems the results are
On i386, size is 1/2048 of bytes of total memory.
On ia64, size is 1/512 of bytes of total memory.

  The large page effect can be removed by just replacing
the use of PAGE_SHIFT with a constant of 12 in the calls to
alloc_large_system_hash.  That makes them more like the other uses of
that function from fs/inode.c and fs/dcache.c
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18955cfc

24 11月, 2005 1 次提交

[NETFILTER]: ip_conntrack_netlink.c needs linux/interrupt.h · de919820

由 Benoit Boissinot 提交于 11月 23, 2005

net/ipv4/netfilter/ip_conntrack_netlink.c: In function 'ctnetlink_dump_table':
net/ipv4/netfilter/ip_conntrack_netlink.c:409: warning: implicit declaration of function 'local_bh_disable'
net/ipv4/netfilter/ip_conntrack_netlink.c:427: warning: implicit declaration of function 'local_bh_enable'
Signed-off-by: NBenoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de919820

23 11月, 2005 1 次提交

[NETFILTER] ctnetlink: Fix refcount leak ip_conntrack/nat_proto · 00cb277a

由 Pablo Neira Ayuso 提交于 11月 22, 2005

Remove proto == NULL checking since ip_conntrack_[nat_]proto_find_get
always returns a valid pointer.

Fix missing ip_conntrack_proto_put in some paths.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NHarald Welte <laforge@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00cb277a

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功