提交 · 339bf98ffc6a8d8eb16fc532ac57ffbced2f8a68 · openeuler / raspberrypi-kernel

03 12月, 2006 16 次提交

[NETLINK]: Do precise netlink message allocations where possible · 339bf98f

由 Thomas Graf 提交于 11月 10, 2006

Account for the netlink message header size directly in nlmsg_new()
instead of relying on the caller calculate it correctly.

Replaces error handling of message construction functions when
constructing notifications with bug traps since a failure implies
a bug in calculating the size of the skb.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NPaul Moore <paul.moore@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

339bf98f

[TCP]: Remove dead code in init_sequence · a94f723d

由 Gerrit Renker 提交于 11月 10, 2006

This removes two redundancies:

1) The test (skb->protocol == htons(ETH_P_IPV6) in tcp_v6_init_sequence()
   is always true, due to
	* tcp_v6_conn_request() is the only function calling this one
	* tcp_v6_conn_request() redirects all skb's with ETH_P_IP protocol to
	  tcp_v4_conn_request() [ cf. top of tcp_v6_conn_request()]

2) The first argument, `struct sock *sk' of tcp_v{4,6}_init_sequence() is
   never used.
Signed-off-by: NGerrit Renker  <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a94f723d

[TCP]: Don't set SKB owner in tcp_transmit_skb(). · 93173112

由 David S. Miller 提交于 11月 09, 2006

The data itself is already charged to the SKB, doing
the skb_set_owner_w() just generates a lot of noise and
extra atomics we don't really need.

Lmbench improvements on lat_tcp are minimal:

before:
TCP latency using localhost: 23.2701 microseconds
TCP latency using localhost: 23.1994 microseconds
TCP latency using localhost: 23.2257 microseconds

after:
TCP latency using localhost: 22.8380 microseconds
TCP latency using localhost: 22.9465 microseconds
TCP latency using localhost: 22.8462 microseconds
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93173112

[TCP]: Allow autoloading of congestion control via setsockopt. · 35bfbc94

由 Stephen Hemminger 提交于 11月 09, 2006

If user has permision to load modules, then autoload then attempt
autoload of TCP congestion module.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35bfbc94

[TCP]: Restrict congestion control choices. · ce7bc3bf

由 Stephen Hemminger 提交于 11月 09, 2006

Allow normal users to only choose among a restricted set of congestion
control choices.  The default is reno and what ever has been configured
as default. But the policy can be changed by administrator at any time.

For example, to allow any choice:
    cp /proc/sys/net/ipv4/tcp_available_congestion_control \
       /proc/sys/net/ipv4/tcp_allowed_congestion_control
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce7bc3bf

[TCP]: Add tcp_available_congestion_control sysctl. · 3ff825b2

由 Stephen Hemminger 提交于 11月 09, 2006

Create /proc/sys/net/ipv4/tcp_available_congestion_control
that reflects currently available TCP choices.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ff825b2

[NET]: Size listen hash tables using backlog hint · 72a3effa

由 Eric Dumazet 提交于 11月 16, 2006

We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for
each LISTEN socket, regardless of various parameters (listen backlog for
example)

On x86_64, this means order-1 allocations (might fail), even for 'small'
sockets, expecting few connections. On the contrary, a huge server wanting a
backlog of 50000 is slowed down a bit because of this fixed limit.

This patch makes the sizing of listen hash table a dynamic parameter,
depending of :
- net.core.somaxconn tunable (default is 128)
- net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128)
- backlog value given by user application  (2nd parameter of listen())

For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of
kmalloc().

We still limit memory allocation with the two existing tunables (somaxconn &
tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM
usage.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72a3effa

[NET] rules: Share common attribute validation policy · 1f6c9557

由 Thomas Graf 提交于 11月 09, 2006

Move the attribute policy for the non-specific attributes into
net/fib_rules.h and include it in the respective protocols.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f6c9557

[NET] rules: Protocol independant mark selector · b8964ed9

由 Thomas Graf 提交于 11月 09, 2006

Move mark selector currently implemented per protocol into
the protocol independant part.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8964ed9

[IPV4] nl_fib_lookup: Rename fl_fwmark to fl_mark · 5f300893

由 Thomas Graf 提交于 11月 09, 2006

For the sake of consistency.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f300893

[NET]: Rethink mark field in struct flowi · 47dcf0cb

由 Thomas Graf 提交于 11月 09, 2006

Now that all protocols have been made aware of the mark
field it can be moved out of the union thus simplyfing
its usage.

The config options in the IPv4/IPv6/DECnet subsystems
to enable respectively disable mark based routing only
obfuscate the code with ifdefs, the cost for the
additional comparison in the flow key is insignificant,
and most distributions have all these options enabled
by default anyway. Therefore it makes sense to remove
the config options and enable mark based routing by
default.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47dcf0cb

[NET]: Turn nfmark into generic mark · 82e91ffe

由 Thomas Graf 提交于 11月 09, 2006

nfmark is being used in various subsystems and has become
the defacto mark field for all kinds of packets. Therefore
it makes sense to rename it to `mark' and remove the
dependency on CONFIG_NETFILTER.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82e91ffe

SELinux: Return correct context for SO_PEERSEC · 6b877699

由 Venkat Yekkirala 提交于 11月 08, 2006

Fix SO_PEERSEC for tcp sockets to return the security context of
the peer (as represented by the SA from the peer) as opposed to the
SA used by the local/source socket.
Signed-off-by: NVenkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

6b877699

[IPV4]: encapsulation annotations · d5a0a1e3

由 Al Viro 提交于 11月 08, 2006

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5a0a1e3

[XFRM]: misc annotations · 8c689a6e

由 Al Viro 提交于 11月 08, 2006

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c689a6e

[NET]: ipconfig and nfsroot annotations · 5a874db4

由 Al Viro 提交于 11月 08, 2006

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a874db4

29 11月, 2006 3 次提交

[NETFILTER]: ipt_REJECT: fix memory corruption · af443b6d

由 Patrick McHardy 提交于 11月 28, 2006

On devices with hard_header_len > LL_MAX_HEADER ip_route_me_harder()
reallocates the skb, leading to memory corruption when using the stale
tcph pointer to update the checksum.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af443b6d

[NETFILTER]: conntrack: fix refcount leak when finding expectation · 2e47c264

由 Yasuyuki Kozakai 提交于 11月 27, 2006

All users of __{ip,nf}_conntrack_expect_find() don't expect that
it increments the reference count of expectation.
Signed-off-by: NYasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e47c264

[NETFILTER]: ctnetlink: fix reference count leak · c537b75a

由 Patrick McHardy 提交于 11月 27, 2006

When NFA_NEST exceeds the skb size the protocol reference is leaked.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c537b75a

26 11月, 2006 3 次提交

[NET]: Fix kfifo_alloc() error check. · ac16ca64

由 Akinobu Mita 提交于 11月 22, 2006

The return value of kfifo_alloc() should be checked by IS_ERR().
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac16ca64

[UDP]: Make udp_encap_rcv use pskb_may_pull · 753eab76

由 Olaf Kirch 提交于 11月 22, 2006

Make udp_encap_rcv use pskb_may_pull

IPsec with NAT-T breaks on some notebooks using the latest e1000 chipset,
when header split is enabled. When receiving sufficiently large packets, the
driver puts everything up to and including the UDP header into the header
portion of the skb, and the rest goes into the paged part. udp_encap_rcv
forgets to use pskb_may_pull, and fails to decapsulate it. Instead, it
passes it up it to the IKE daemon.
Signed-off-by: NOlaf Kirch <okir@suse.de>
Signed-off-by: NJean Delvare <jdelvare@suse.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

753eab76

[NETFILTER]: H.323 conntrack: fix crash with CONFIG_IP_NF_CT_ACCT · 38f7efd5

由 Faidon Liambotis 提交于 11月 21, 2006

H.323 connection tracking code calls ip_ct_refresh_acct() when
processing RCFs and URQs but passes NULL as the skb.
When CONFIG_IP_NF_CT_ACCT is enabled, the connection tracking core tries
to derefence the skb, which results in an obvious panic.
A similar fix was applied on the SIP connection tracking code some time
ago.
Signed-off-by: NFaidon Liambotis <paravoid@debian.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38f7efd5

16 11月, 2006 2 次提交

[TCP]: Fix up sysctl_tcp_mem initialization. · 52bf376c

由 John Heffner 提交于 11月 14, 2006

Fix up tcp_mem initial settings to take into account the size of the
hash entries (different on SMP and non-SMP systems).
Signed-off-by: NJohn Heffner <jheffner@psc.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52bf376c

[NETFILTER]: Use pskb_trim in {ip,ip6,nfnetlink}_queue · d8a585d7

由 Patrick McHardy 提交于 11月 14, 2006

Based on patch by James D. Nurmi:

I've got some code very dependant on nfnetlink_queue, and turned up a
large number of warns coming from skb_trim.  While it's quite possibly
my code, having not seen it on older kernels made me a bit suspect.

Anyhow, based on some googling I turned up this thread:
http://lkml.org/lkml/2006/8/13/56

And believe the issue to be related, so attached is a small patch to
the kernel -- not sure if this is completely correct, but for anyone
else hitting the WARN_ON(1) in skbuff.h, it might be helpful..
Signed-off-by: NJames D. Nurmi <jdnurmi@gmail.com>

Ported to ip6_queue and nfnetlink_queue and added return value
checks.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8a585d7

11 11月, 2006 1 次提交

[IPVS]: More endianness fixed. · bb831eb2

由 Julian Anastasov 提交于 11月 10, 2006

- make sure port in FTP data is in network order (in fact it was looking
buggy for big endian boxes before Viro's changes)
- htonl -> htons for port
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb831eb2

08 11月, 2006 1 次提交

[TCP]: Don't use highmem in tcp hash size calculation. · 9e950efa

由 John Heffner 提交于 11月 06, 2006

This patch removes consideration of high memory when determining TCP
hash table sizes.  Taking into account high memory results in tcp_mem
values that are too large.
Signed-off-by: NJohn Heffner <jheffner@psc.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e950efa

02 11月, 2006 1 次提交

[TCP]: Set default congestion control when no sysctl. · b1736a71

由 Stephen Hemminger 提交于 10月 31, 2006

The setting of the default congestion control was buried in
the sysctl code so it would not be done properly if SYSCTL was
not enabled.
Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1736a71

31 10月, 2006 5 次提交

[NetLabel]: protect the CIPSOv4 socket option from setsockopt() · f8687afe

由 Paul Moore 提交于 10月 30, 2006

This patch makes two changes to protect applications from either removing or
tampering with the CIPSOv4 IP option on a socket. The first is the requirement
that applications have the CAP_NET_RAW capability to set an IPOPT_CIPSO option
on a socket; this prevents untrusted applications from setting their own
CIPSOv4 security attributes on the packets they send. The second change is to
SELinux and it prevents applications from setting any IPv4 options when there
is an IPOPT_CIPSO option already present on the socket; this prevents
applications from removing CIPSOv4 security attributes from the packets they
send.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Signed-off-by: NJames Morris <jmorris@namei.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8687afe

[NETFILTER]: ip_tables: compat code module refcounting fix · 920b868a

由 Dmitry Mishin 提交于 10月 30, 2006

This patch fixes bug in iptables modules refcounting on compat error way.

As we are getting modules in check_compat_entry_size_and_hooks(), in case of
later error, we should put them all in translate_compat_table(), not  in the
compat_copy_entry_from_user() or compat_copy_match_from_user(), as it is now.
Signed-off-by: NDmitry Mishin <dim@openvz.org>
Acked-by: NVasily Averin <vvs@openvz.org>
Acked-by: NKirill Korotaev <dev@openvz.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

920b868a

[NETFILTER]: ip_tables: compat error way cleanup · ef4512e7

由 Vasily Averin 提交于 10月 30, 2006

This patch adds forgotten compat_flush_offset() call to error way of
translate_compat_table().  May lead to table corruption on the next
compat_do_replace().
Signed-off-by: NVasily Averin <vvs@openvz.org>
Acked-by: NDmitry Mishin <dim@openvz.org>
Acked-by: NKirill Korotaev <dev@openvz.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef4512e7

[NETFILTER]: Missed and reordered checks in {arp,ip,ip6}_tables · 590bdf7f

由 Dmitry Mishin 提交于 10月 30, 2006

There is a number of issues in parsing user-provided table in
translate_table(). Malicious user with CAP_NET_ADMIN may crash system by
passing special-crafted table to the *_tables.

The first issue is that mark_source_chains() function is called before entry
content checks. In case of standard target, mark_source_chains() function
uses t->verdict field in order to determine new position. But the check, that
this field leads no further, than the table end, is in check_entry(), which
is called later, than mark_source_chains().

The second issue, that there is no check that target_offset points inside
entry. If so, *_ITERATE_MATCH macro will follow further, than the entry
ends. As a result, we'll have oops or memory disclosure.

And the third issue, that there is no check that the target is completely
inside entry. Results are the same, as in previous issue.
Signed-off-by: NDmitry Mishin <dim@openvz.org>
Acked-by: NKirill Korotaev <dev@openvz.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

590bdf7f

[NET]: fix uaccess handling · a27b58fe

由 Heiko Carstens 提交于 10月 30, 2006

Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a27b58fe

26 10月, 2006 2 次提交

[TCP] H-TCP: fix integer overflow · 2a272f98

由 Gavin McCullagh 提交于 10月 25, 2006

When using H-TCP with a single flow on a 500Mbit connection (or less
actually), alpha can exceed 65000, so alpha needs to be a u32.
Signed-off-by: NGavin McCullagh <gavin.mccullagh@nuim.ie>
Signed-off-by: NDoug Leith <doug.leith@nuim.ie>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2a272f98

[TCP] cubic: scaling error · 22119240

由 Stephen Hemminger 提交于 10月 25, 2006

Doug Leith observed a discrepancy between the version of CUBIC described
in the papers and the version in 2.6.18. A math error related to scaling
causes Cubic to grow too slowly.

Patch is from "Sangtae Ha" <sha2@ncsu.edu>. I validated that
it does fix the problems.

See the following to show behavior over 500ms 100 Mbit link.

Sender (2.6.19-rc3) --- Bridge (2.6.18-rt7) ------- Receiver (2.6.19-rc3)
1G [netem] 100M

http://developer.osdl.org/shemminger/tcp/2.6.19-rc3/cubic-orig.png
http://developer.osdl.org/shemminger/tcp/2.6.19-rc3/cubic-fix.pngSigned-off-by: NStephen Hemminger <shemminger@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22119240

25 10月, 2006 1 次提交

[IPV4] ipconfig: fix RARP ic_servaddr breakage · 82571026

由 Al Viro 提交于 10月 24, 2006

memcpy 4 bytes to address of auto unsigned long variable followed
by comparison with u32 is a bloody bad idea.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82571026

20 10月, 2006 2 次提交

[TCP]: One NET_INC_STATS() could be NET_INC_STATS_BH in tcp_v4_err() · 06ca719f

由 Eric Dumazet 提交于 10月 20, 2006

I believe this NET_INC_STATS() call can be replaced by
NET_INC_STATS_BH(), a little bit cheaper.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06ca719f

[NETFILTER]: Missing check for CAP_NET_ADMIN in iptables compat layer · 82fac054

由 Björn Steinbrink 提交于 10月 20, 2006

The 32bit compatibility layer has no CAP_NET_ADMIN check in
compat_do_ipt_get_ctl, which for example allows to list the current
iptables rules even without having that capability (the non-compat
version requires it). Other capabilities might be required to exploit
the bug (eg. CAP_NET_RAW to get the nfnetlink socket?), so a plain user
can't exploit it, but a setup actually using the posix capability system
might very well hit such a constellation of granted capabilities.
Signed-off-by: NBjörn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82fac054

19 10月, 2006 2 次提交

[TCP]: Bound TSO defer time · ae8064ac

由 John Heffner 提交于 10月 18, 2006

This patch limits the amount of time you will defer sending a TSO segment
to less than two clock ticks, or the time between two acks, whichever is
longer.

On slow links, deferring causes significant bursts.  See attached plots,
which show RTT through a 1 Mbps link with a 100 ms RTT and ~100 ms queue
for (a) non-TSO, (b) currnet TSO, and (c) patched TSO.  This burstiness
causes significant jitter, tends to overflow queues early (bad for short
queues), and makes delay-based congestion control more difficult.

Deferring by a couple clock ticks I believe will have a relatively small
impact on performance.
Signed-off-by: NJohn Heffner <jheffner@psc.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae8064ac

[IPv4] fib: Remove unused fib_config members · b52f070c

由 Thomas Graf 提交于 10月 18, 2006

Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b52f070c

16 10月, 2006 1 次提交
- E
  [NET]: reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire() · 4663afe2
  由 Eric Dumazet 提交于 10月 12, 2006
```
1) shrink struct inet_peer on 64 bits platforms.
```
  4663afe2