提交 · 55748ac0468134a89bc55aed6a9691e320caa8a9 · openeuler / Kernel

15 10月, 2009 2 次提交

Phonet: routing table backend · 55748ac0

由 Rémi Denis-Courmont 提交于 10月 14, 2009

The Phonet "universe" only has 64 addresses, so we keep a trivial flat
routing table.
Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55748ac0

Phonet: deliver broadcast packets to broadcast sockets · f14001fc

由 Rémi Denis-Courmont 提交于 10月 14, 2009

Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f14001fc

13 10月, 2009 4 次提交

tcp: replace ehash_size by ehash_mask · f373b53b

由 Eric Dumazet 提交于 10月 09, 2009

Storing the mask (size - 1) instead of the size allows fast path to be
a bit faster.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f373b53b

net: Introduce recvmmsg socket syscall · a2e27255

由 Arnaldo Carvalho de Melo 提交于 10月 12, 2009

Meaning receive multiple messages, reducing the number of syscalls and
net stack entry/exit operations.

Next patches will introduce mechanisms where protocols that want to
optimize this operation will provide an unlocked_recvmsg operation.

This takes into account comments made by:

. Paul Moore: sock_recvmsg is called only for the first datagram,
  sock_recvmsg_nosec is used for the rest.

. Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
  works in the same fashion as the ppoll one.

  If the underlying protocol returns a datagram with MSG_OOB set, this
  will make recvmmsg return right away with as many datagrams (+ the OOB
  one) it has received so far.

. Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
  datagrams and then recvmsg returns an error, recvmmsg will return
  the successfully received datagrams, store the error and return it
  in the next call.

This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
where we will be able to acquire the lock only at batch start and end, not at
every underlying recvmsg call.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2e27255

net: Generalize socket rx gap / receive queue overflow cmsg · 3b885787

由 Neil Horman 提交于 10月 12, 2009

Create a new socket level option to report number of queue overflows

Recently I augmented the AF_PACKET protocol to report the number of frames lost
on the socket receive queue between any two enqueued frames. This value was
exported via a SOL_PACKET level cmsg. AFter I completed that work it was
requested that this feature be generalized so that any datagram oriented socket
could make use of this option. As such I've created this patch, It creates a
new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
overflowed between any two given frames. It also augments the AF_PACKET
protocol to take advantage of this new feature (as it previously did not touch
sk->sk_drops, which this patch uses to record the overflow count). Tested
successfully by me.

Notes:

1) Unlike my previous patch, this patch simply records the sk_drops value, which
is not a number of drops between packets, but rather a total number of drops.
Deltas must be computed in user space.

2) While this patch currently works with datagram oriented protocols, it will
also be accepted by non-datagram oriented protocols. I'm not sure if thats
agreeable to everyone, but my argument in favor of doing so is that, for those
protocols which aren't applicable to this option, sk_drops will always be zero,
and reporting no drops on a receive queue that isn't used for those
non-participating protocols seems reasonable to me. This also saves us having
to code in a per-protocol opt in mechanism.

3) This applies cleanly to net-next assuming that commit
97775007 (my af packet cmsg patch) is reverted
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b885787

mac80211: document ieee80211_rx() context requirement · d20ef63d

由 Johannes Berg 提交于 10月 11, 2009

ieee80211_rx() must be called with softirqs disabled
since the networking stack requires this for netif_rx()
and some code in mac80211 can assume that it can not
be processing its own tasklet and this call at the same
time.

It may be possible to remove this requirement after a
careful audit of mac80211 and doing any needed locking
improvements in it along with disabling softirqs around
netif_rx(). An alternative might be to push all packet
processing to process context in mac80211, instead of
to the tasklet, and add other synchronisation.
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

d20ef63d

12 10月, 2009 1 次提交

net: Fix struct sock bitfield annotation · 5fdb9973

由 Eric Dumazet 提交于 10月 08, 2009

Since commit a98b65a3 (net: annotate struct sock bitfield), we lost
8 bytes in struct sock on 64bit arches because of
kmemcheck_bitfield_end(flags) misplacement.

Fix this by putting together sk_shutdown, sk_no_check, sk_userlocks,
sk_protocol and sk_type in the 'flags' 32bits bitfield
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5fdb9973

08 10月, 2009 3 次提交

udp: dynamically size hash tables at boot time · f86dcc5a

由 Eric Dumazet 提交于 10月 07, 2009

UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for
several setups.

4000 active UDP sockets -> 32 sockets per chain in average. An
incoming frame has to lookup all sockets to find best match, so long
chains hurt latency.

Instead of a fixed size hash table that cant be perfect for every
needs, let UDP stack choose its table size at boot time like tcp/ip
route, using alloc_large_system_hash() helper

Add an optional boot parameter, uhash_entries=x so that an admin can
force a size between 256 and 65536 if needed, like thash_entries and
rhash_entries.

dmesg logs two new lines :
[    0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes)
[    0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes)

Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non
debugging spinlocks.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f86dcc5a

cfg80211: add firmware and hardware version to wiphy · dfce95f5

由 Kalle Valo 提交于 9月 24, 2009

It's useful to provide firmware and hardware version to user space and have a
generic interface to retrieve them. Users can provide the version information
in bug reports etc.

Add fields for firmware and hardware version to struct wiphy.

(Dropped nl80211 bits for now and modified remaining bits in favor of
ethtool. -- JWL)

Cc: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

dfce95f5

wext: refactor · 3d23e349

由 Johannes Berg 提交于 9月 29, 2009

Refactor wext to
 * split out iwpriv handling
 * split out iwspy handling
 * split out procfs support
 * allow cfg80211 to have wireless extensions compat code
   w/o CONFIG_WIRELESS_EXT

After this, drivers need to
 - select WIRELESS_EXT	- for wext support
 - select WEXT_PRIV	- for iwpriv support
 - select WEXT_SPY	- for iwspy support

except cfg80211 -- which gets new hooks in wext-core.c
and can then get wext handlers without CONFIG_WIRELESS_EXT.

Wireless extensions procfs support is auto-selected
based on PROC_FS and anything that requires the wext core
(i.e. WIRELESS_EXT or CFG80211_WEXT).
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

3d23e349

07 10月, 2009 4 次提交

net: mark net_proto_ops as const · ec1b4cf7

由 Stephen Hemminger 提交于 10月 05, 2009

All usages of structure net_proto_ops should be declared const.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec1b4cf7

pkt_sched: gen_estimator: Dont report fake rate estimators · d250a5f9

由 Eric Dumazet 提交于 10月 02, 2009

Jarek Poplawski a écrit :
>
>
> Hmm... So you made me to do some "real" work here, and guess what?:
> there is one serious checkpatch warning! ;-) Plus, this new parameter
> should be added to the function description. Otherwise:
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
>
> Thanks,
> Jarek P.
>
> PS: I guess full "Don't" would show we really mean it...

Okay :) Here is the last round, before the night !

Thanks again

[RFC] pkt_sched: gen_estimator: Don't report fake rate estimators

We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
is running.

# tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0

User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
one (because no estimator is active)

After this patch, tc command output is :
$ tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

We add a parameter to gnet_stats_copy_rate_est() function so that
it can use gen_estimator_active(bstats, r), as suggested by Jarek.

This parameter can be NULL if check is not necessary, (htb for
example has a mandatory rate estimator)
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d250a5f9

ipv6 sit: 6rd (IPv6 Rapid Deployment) Support. · fa857afc

由 YOSHIFUJI Hideaki / 吉藤英明提交于 9月 22, 2009

IPv6 Rapid Deployment (6rd; draft-ietf-softwire-ipv6-6rd) builds upon
mechanisms of 6to4 (RFC3056) to enable a service provider to rapidly
deploy IPv6 unicast service to IPv4 sites to which it provides
customer premise equipment.  Like 6to4, it utilizes stateless IPv6 in
IPv4 encapsulation in order to transit IPv4-only network
infrastructure.  Unlike 6to4, a 6rd service provider uses an IPv6
prefix of its own in place of the fixed 6to4 prefix.

With this option enabled, the SIT driver offers 6rd functionality by
providing additional ioctl API to configure the IPv6 Prefix for in
stead of static 2002::/16 for 6to4.

Original patch was done by Alexandre Cassen <acassen@freebox.fr>
based on old Internet-Draft.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa857afc

net: speedup sk_wake_async() · bcdce719

由 Eric Dumazet 提交于 10月 06, 2009

An incoming datagram must bring into cpu cache *lot* of cache lines,
in particular : (other parts omitted (hash chains, ip route cache...))

On 32bit arches :

offsetof(struct sock, sk_rcvbuf) =0x30 (read)
offsetof(struct sock, sk_lock) =0x34 (rw)

offsetof(struct sock, sk_sleep) =0x50 (read)
offsetof(struct sock, sk_rmem_alloc) =0x64 (rw)
offsetof(struct sock, sk_receive_queue)=0x74 (rw)

offsetof(struct sock, sk_forward_alloc)=0x98 (rw)

offsetof(struct sock, sk_callback_lock)=0xcc (rw)
offsetof(struct sock, sk_drops) =0xd8 (read if we add dropcount support, rw if frame dropped)
offsetof(struct sock, sk_filter) =0xf8 (read)

offsetof(struct sock, sk_socket) =0x138 (read)

offsetof(struct sock, sk_data_ready) =0x15c (read)

We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
with no fasync() structures. (socket->fasync_list ptr is probably already in cache
because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)

This avoids one cache line load per incoming packet for common cases (no fasync())

We can leave (or even move in a future patch) sk->sk_socket in a cold location
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bcdce719

05 10月, 2009 2 次提交

tunnels: Optimize tx path · 0bfbedb1

由 Eric Dumazet 提交于 10月 05, 2009

We currently dirty a cache line to update tunnel device stats
(tx_packets/tx_bytes). We better use the txq->tx_bytes/tx_packets
counters that already are present in cpu cache, in the cache
line shared with txq->_xmit_lock

This patch extends IPTUNNEL_XMIT() macro to use txq pointer
provided by the caller.

Also &tunnel->dev->stats can be replaced by &dev->stats
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0bfbedb1

ipv4: fib table algorithm performance improvement · 16c6cf8b

由 Stephen Hemminger 提交于 9月 20, 2009

The FIB algorithim for IPV4 is set at compile time, but kernel goes through
the overhead of function call indirection at runtime. Save some
cycles by turning the indirect calls to direct calls to either
hash or trie code.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16c6cf8b

01 10月, 2009 1 次提交

net: Make setsockopt() optlen be unsigned. · b7058842

由 David S. Miller 提交于 9月 30, 2009

This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.

Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7058842

29 9月, 2009 1 次提交

wext: add back wireless/ dir in sysfs for cfg80211 interfaces · 8f1546ca

由 Johannes Berg 提交于 9月 28, 2009

The move away from having drivers assign wireless handlers,
in favour of making cfg80211 assign them, broke the sysfs
registration (the wireless/ dir went missing) because the
handlers are now assigned only after registration, which is
too late.

Fix this by special-casing cfg80211-based devices, all
of which are required to have an ieee80211_ptr, in the
sysfs code, and also using get_wireless_stats() to have
the same values reported as in procfs.
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Reported-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Tested-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

8f1546ca

27 9月, 2009 1 次提交

Revert "sit: stateless autoconf for isatap" · d1f8297a

由 Sascha Hlusiak 提交于 9月 26, 2009

This reverts commit 64506929.

While the code does not actually break anything, it does not completely follow
RFC5214 yet. After talking back with Fred L. Templin, I agree that completing the
ISATAP specific RS/RA code, would pollute the kernel a lot with code that is better
implemented in userspace.

The kernel should not send RS packages for ISATAP at all.
Signed-off-by: NSascha Hlusiak <contact@saschahlusiak.de>
Acked-by: NFred L. Templin <Fred.L.Templin@boeing.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1f8297a

25 9月, 2009 1 次提交

tunnel: eliminate recursion field · a43912ab

由 Eric Dumazet 提交于 9月 23, 2009

It seems recursion field from "struct ip_tunnel" is not anymore needed.
recursion prevention is done at the upper level (in dev_queue_xmit()),
since we use HARD_TX_LOCK protection for tunnels.

This avoids a cache line ping pong on "struct ip_tunnel" : This structure
should be now mostly read on xmit and receive paths.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a43912ab

24 9月, 2009 2 次提交

sysctl: remove "struct file *" argument of ->proc_handler · 8d65af78

由 Alexey Dobriyan 提交于 9月 23, 2009

It's unused.

It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.

It _was_ used in two places at arch/frv for some reason.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8d65af78

9p: Add fscache support to 9p · 60e78d2c

由 Abhishek Kulkarni 提交于 9月 23, 2009

This patch adds a persistent, read-only caching facility for
9p clients using the FS-Cache caching backend.

When the fscache facility is enabled, each inode is associated
with a corresponding vcookie which is an index into the FS-Cache
indexing tree. The FS-Cache indexing tree is indexed at 3 levels:
- session object associated with each mount.
- inode/vcookie
- actual data (pages)

A cache tag is chosen randomly for each session. These tags can
be read off /sys/fs/9p/caches and can be passed as a mount-time
parameter to re-attach to the specified caching session.
Signed-off-by: NAbhishek Kulkarni <adkulkar@umail.iu.edu>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

60e78d2c

15 9月, 2009 5 次提交

pkt_sched: Fix tx queue selection in tc_modify_qdisc · 926e61b7

由 Jarek Poplawski 提交于 9月 15, 2009

After the recent mq change there is the new select_queue qdisc class
method used in tc_modify_qdisc, but it works OK only for direct child
qdiscs of mq qdisc. Grandchildren always get the first tx queue, which
would give wrong qdisc_root etc. results (e.g. for sch_htb as child of
sch_prio). This patch fixes it by using parent's dev_queue for such
grandchildren qdiscs. The select_queue method's return type is changed
BTW.

With feedback from: Patrick McHardy <kaber@trash.net>
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

926e61b7

bonding: remap muticast addresses without using dev_close() and dev_open() · 75c78500

由 Moni Shoua 提交于 9月 15, 2009

This patch fixes commit e36b9d16. The approach
there is to call dev_close()/dev_open() whenever the device type is changed in
order to remap the device IP multicast addresses to HW multicast addresses.
This approach suffers from 2 drawbacks:

*. It assumes tha the device is UP when calling dev_close(), or otherwise
   dev_close() has no affect. It is worth to mention that initscripts (Redhat)
   and sysconfig (Suse) doesn't act the same in this matter. 
*. dev_close() has other side affects, like deleting entries from the routing
   table, which might be unnecessary.

The fix here is to directly remap the IP multicast addresses to HW multicast
addresses for a bonding device that changes its type, and nothing else.
Reported-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NMoni Shoua <monis@voltaire.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

75c78500

tcp: fix ssthresh u16 leftover · 0b6a05c1

由 Ilpo Järvinen 提交于 9月 15, 2009

It was once upon time so that snd_sthresh was a 16-bit quantity.
...That has not been true for long period of time. I run across
some ancient compares which still seem to trust such legacy.
Put all that magic into a single place, I hopefully found all
of them.

Compile tested, though linking of allyesconfig is ridiculous
nowadays it seems.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b6a05c1

net: constify struct inet6_protocol · 41135cc8

由 Alexey Dobriyan 提交于 9月 14, 2009

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41135cc8

net: constify struct net_protocol · 32613090

由 Alexey Dobriyan 提交于 9月 14, 2009

Remove long removed "inet_protocol_base" declaration.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32613090

10 9月, 2009 1 次提交

net_sched: fix estimator lock selection for mq child qdiscs · 23bcf634

由 Patrick McHardy 提交于 9月 09, 2009

When new child qdiscs are attached to the mq qdisc, they are actually
attached as root qdiscs to the device queues. The lock selection for
new estimators incorrectly picks the root lock of the existing and
to be replaced qdisc, which results in a use-after-free once the old
qdisc has been destroyed.

Mark mq qdisc instances with a new flag and treat qdiscs attached to
mq as children similar to regular root qdiscs.

Additionally prevent estimators from being attached to the mq qdisc
itself since it only updates its byte and packet counters during dumps.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23bcf634

06 9月, 2009 2 次提交

net_sched: add classful multiqueue dummy scheduler · 6ec1c69a

由 David S. Miller 提交于 9月 06, 2009

This patch adds a classful dummy scheduler which can be used as root qdisc
for multiqueue devices and exposes each device queue as a child class.

This allows to address queues individually and graft them similar to regular
classes. Additionally it presents an accumulated view of the statistics of
all real root qdiscs in the dummy root.

Two new callbacks are added to the qdisc_ops and qdisc_class_ops:

- cl_ops->select_queue selects the tx queue number for new child classes.

- qdisc_ops->attach() overrides root qdisc device grafting to attach
  non-shared qdiscs to the queues.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ec1c69a

net_sched: move dev_graft_qdisc() to sch_generic.c · 589983cd

由 Patrick McHardy 提交于 9月 04, 2009

It will be used in a following patch by the multiqueue qdisc.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

589983cd

05 9月, 2009 9 次提交

sctp: turn flags in 'struct sctp_association' into bit fields · 9237ccbc

由 Wei Yongjun 提交于 9月 04, 2009

This shrinks the size of struct sctp_association a little.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

9237ccbc

sctp: Sysctl configuration for IPv4 Address Scoping · 72388433

由 Bhaskar Dutta 提交于 9月 03, 2009

This patch introduces a new sysctl option to make IPv4 Address Scoping
configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>.

In networking environments where DNAT rules in iptables prerouting
chains convert destination IP's to link-local/private IP addresses,
SCTP connections fail to establish as the INIT chunk is dropped by the
kernel due to address scope match failure.
For example to support overlapping IP addresses (same IP address with
different vlan id) a Layer-5 application listens on link local IP's,
and there is a DNAT rule that maps the destination IP to a link local
IP. Such applications never get the SCTP INIT if the address-scoping
draft is strictly followed.

This sysctl configuration allows SCTP to function in such
unconventional networking environments.

Sysctl options:
0 - Disable IPv4 address scoping draft altogether
1 - Enable IPv4 address scoping (default, current behavior)
2 - Enable address scoping but allow IPv4 private addresses in init/init-ack
3 - Enable address scoping but allow IPv4 link local address in init/init-ack
Signed-off-by: NBhaskar Dutta <bhaskar.dutta@globallogic.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

72388433

V
sctp: Turn flags in 'sctp_packet' into bit fields · a803c942
由 Vlad Yasevich 提交于 9月 04, 2009
```
This shrinks the size of sctp_packet a little.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
```
a803c942

sctp: Fix SCTP_MAXSEG socket option to comply to spec. · f68b2e05

由 Vlad Yasevich 提交于 9月 04, 2009

We had a bug that we never stored the user-defined value for
MAXSEG when setting the value on an association.  Thus future
PMTU events ended up re-writing the frag point and increasing
it past user limit.  Additionally, when setting the option on
the socket/endpoint, we effect all current associations, which
is against spec.

Now, we store the user 'maxseg' value along with the computed
'frag_point'.  We inherit 'maxseg' from the socket at association
creation and use it as an upper limit for 'frag_point' when its
set.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

f68b2e05

sctp: Don't do NAGLE delay on large writes that were fragmented small · cb95ea32

由 Vlad Yasevich 提交于 9月 04, 2009

SCTP will delay the last part of a large write due to NAGLE, if that
part is smaller then MTU. Since we are doing large writes, we might
as well send the last portion now instead of waiting untill the next
large write happens. The small portion will be sent as is regardless,
so it's better to not delay it.

This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com>
and Doug Graham <dgraham@nortel.com>. Many thanks go out to them.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

cb95ea32

sctp: drop a_rwnd to 0 when receive buffer overflows. · 4d3c46e6

由 Vlad Yasevich 提交于 9月 04, 2009

SCTP has a problem that when small chunks are used, it is possible
to exhaust the receiver buffer without fully closing receive window.
This happens due to all overhead that we have account for with small
messages. To fix this, when receive buffer is exceeded, we'll drop
the window to 0 and save the 'drop' portion. When application starts
reading data and freeing up recevie buffer space, we'll wait until
we've reached the 'drop' window and then add back this 'drop' one
mtu at a time. This worked well in testing and under stress produced
rather even recovery.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

4d3c46e6

sctp: Send user messages to the lower layer as one · 9c5c62be

由 Vlad Yasevich 提交于 8月 10, 2009

Currenlty, sctp breaks up user messages into fragments and
sends each fragment to the lower layer by itself.  This means
that for each fragment we go all the way down the stack
and back up.  This also discourages bundling of multiple
fragments when they can fit into a sigle packet (ex: due
to user setting a low fragmentation threashold).

We introduce a new command SCTP_CMD_SND_MSG and hand the
whole message down state machine.  The state machine and
the side-effect parser will cork the queue, add all chunks
from the message to the queue, and then un-cork the queue
thus causing the chunks to get transmitted.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

9c5c62be

sctp: Disallow new connection on a closing socket · bec9640b

由 Vlad Yasevich 提交于 7月 30, 2009

If a socket has a lot of association that are in the process of
of being closed/aborted, it is possible for a remote to establish
new associations during the time period that the old ones are shutting
down. If this was a result of a close() call, there will be no socket
and will cause a memory leak. We'll prevent this by setting the
socket state to CLOSING and disallow new associations when in this state.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

bec9640b

sctp: remove unused union (sctp_cmsg_data_t) definition · b4e8c6a7

由 Rami Rosen 提交于 7月 30, 2009

This patch removes an unused union definition (sctp_cmsg_data_t)
from include/net/sctp/user.h.
Signed-off-by: NRami Rosen <rosenrami@gmail.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

b4e8c6a7

03 9月, 2009 1 次提交

tcp: replace hard coded GFP_KERNEL with sk_allocation · aa133076

由 Wu Fengguang 提交于 9月 02, 2009

This fixed a lockdep warning which appeared when doing stress
memory tests over NFS:

	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.

	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock

	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
			tcp_send_fin => alloc_skb_fclone => page reclaim

David raised a concern that if the allocation fails in tcp_send_fin(), and it's
GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
for the allocation to succeed.

But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
loop endlessly under memory pressure.

CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
CC: David S. Miller <davem@davemloft.net>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa133076

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功