提交 · 313c86da2d1530f3ff631235ce2a0fdea8f3f1ab · openanolis / cloud-kernel

09 1月, 2018 2 次提交

sctp: fix the handling of ICMP Frag Needed for too small MTUs · b6c5734d

由 Marcelo Ricardo Leitner 提交于 1月 05, 2018

syzbot reported a hang involving SCTP, on which it kept flooding dmesg
with the message:
[  246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too
low, using default minimum of 512

That happened because whenever SCTP hits an ICMP Frag Needed, it tries
to adjust to the new MTU and triggers an immediate retransmission. But
it didn't consider the fact that MTUs smaller than the SCTP minimum MTU
allowed (512) would not cause the PMTU to change, and issued the
retransmission anyway (thus leading to another ICMP Frag Needed, and so
on).

As IPv4 (ip_rt_min_pmtu=556) and IPv6 (IPV6_MIN_MTU=1280) minimum MTU
are higher than that, sctp_transport_update_pmtu() is changed to
re-fetch the PMTU that got set after our request, and with that, detect
if there was an actual change or not.

The fix, thus, skips the immediate retransmission if the received ICMP
resulted in no change, in the hope that SCTP will select another path.

Note: The value being used for the minimum MTU (512,
SCTP_DEFAULT_MINSEGMENT) is not right and instead it should be (576,
SCTP_MIN_PMTU), but such change belongs to another patch.

Changes from v1:
- do not disable PMTU discovery, in the light of commit
06ad3919 ("[SCTP] Don't disable PMTU discovery when mtu is small")
and as suggested by Xin Long.
- changed the way to break the rtx loop by detecting if the icmp
  resulted in a change or not
Changes from v2:
none

See-also: https://lkml.org/lkml/2017/12/22/811Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6c5734d

sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled · cc35c3d1

由 Marcelo Ricardo Leitner 提交于 1月 05, 2018

Currently, if PMTU discovery is disabled on a given transport, but the
configured value is higher than the actual PMTU, it is likely that we
will get some icmp Frag Needed. The issue is, if PMTU discovery is
disabled, we won't update the information and will issue a
retransmission immediately, which may very well trigger another ICMP,
and another retransmission, leading to a loop.

The fix is to simply not trigger immediate retransmissions if PMTU
discovery is disabled on the given transport.

Changes from v2:
- updated stale comment, noticed by Xin Long
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc35c3d1

29 10月, 2017 1 次提交

sctp: fix some type cast warnings introduced by transport rhashtable · 8d32503e

由 Xin Long 提交于 10月 28, 2017

These warnings were found by running 'make C=2 M=net/sctp/'.

They are introduced by not aware of Endian for the port when
coding transport rhashtable patches.

Fixes: 7fda702f ("sctp: use new rhlist interface on sctp transport rhashtable")
Reported-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d32503e

20 10月, 2017 1 次提交

sctp: add the missing sock_owned_by_user check in sctp_icmp_redirect · 1cc276ce

由 Xin Long 提交于 10月 18, 2017

Now sctp processes icmp redirect packet in sctp_icmp_redirect where
it calls sctp_transport_dst_check in which tp->dst can be released.

The problem is before calling sctp_transport_dst_check, it doesn't
check sock_owned_by_user, which means tp->dst could be freed while
a process is accessing it with owning the socket.

An use-after-free issue could be triggered by this.

This patch is to fix it by checking sock_owned_by_user before calling
sctp_transport_dst_check in sctp_icmp_redirect, so that it would not
release tp->dst if users still hold sock lock.

Besides, the same issue fixed in commit 45caeaa5 ("dccp/tcp: fix
routing redirect race") on sctp also needs this check.

Fixes: 55be7a9c ("ipv4: Add redirect support to all protocol icmp error handlers")
Reported-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1cc276ce

04 8月, 2017 1 次提交

sctp: remove the typedef sctp_addip_chunk_t · 68d75469

由 Xin Long 提交于 8月 03, 2017

This patch is to remove the typedef sctp_addip_chunk_t, and
replace with struct sctp_addip_chunk in the places where it's
using this typedef.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68d75469

02 7月, 2017 2 次提交

sctp: remove the typedef sctp_init_chunk_t · 01a992be

由 Xin Long 提交于 6月 30, 2017

This patch is to remove the typedef sctp_init_chunk_t, and replace
with struct sctp_init_chunk in the places where it's using this
typedef.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01a992be

sctp: remove the typedef sctp_chunkhdr_t · 922dbc5b

由 Xin Long 提交于 6月 30, 2017

This patch is to remove the typedef sctp_chunkhdr_t, and replace
with struct sctp_chunkhdr in the places where it's using this
typedef.

It is also to fix some indents and use sizeof(variable) instead
of sizeof(type)., especially in sctp_new.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

922dbc5b

27 5月, 2017 1 次提交

sctp: fix ICMP processing if skb is non-linear · 804ec7eb

由 Davide Caratti 提交于 5月 25, 2017

sometimes ICMP replies to INIT chunks are ignored by the client, even if
the encapsulated SCTP headers match an open socket. This happens when the
ICMP packet is carried by a paged skb: use skb_header_pointer() to read
packet contents beyond the SCTP header, so that chunk header and initiate
tag are validated correctly.

v2:
- don't use skb_header_pointer() to read the transport header, since
  icmp_socket_deliver() already puts these 8 bytes in the linear area.
- change commit message to make specific reference to INIT chunks.
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NVlad Yasevich <vyasevich@gmail.com>
Reviewed-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

804ec7eb

05 4月, 2017 1 次提交

sctp: get sock from transport in sctp_transport_update_pmtu · 3ebfdf08

由 Xin Long 提交于 4月 04, 2017

This patch is almost to revert commit 02f3d4ce ("sctp: Adjust PMTU
updates to accomodate route invalidation."). As t->asoc can't be NULL
in sctp_transport_update_pmtu, it could get sk from asoc, and no need
to pass sk into that function.

It is also to remove some duplicated codes from that function.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ebfdf08

02 3月, 2017 1 次提交

sctp: call rcu_read_lock before checking for duplicate transport nodes · 5179b266

由 Xin Long 提交于 2月 28, 2017

Commit cd2b7087 ("sctp: check duplicate node before inserting a
new transport") called rhltable_lookup() to check for the duplicate
transport node in transport rhashtable.

But rhltable_lookup() doesn't call rcu_read_lock inside, it could cause
a use-after-free issue if it tries to dereference the node that another
cpu has freed it. Note that sock lock can not avoid this as it is per
sock.

This patch is to fix it by calling rcu_read_lock before checking for
duplicate transport nodes.

Fixes: cd2b7087 ("sctp: check duplicate node before inserting a new transport")
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5179b266

20 2月, 2017 1 次提交

sctp: check duplicate node before inserting a new transport · cd2b7087

由 Xin Long 提交于 2月 17, 2017

sctp has changed to use rhlist for transport rhashtable since commit
7fda702f ("sctp: use new rhlist interface on sctp transport
rhashtable").

But rhltable_insert_key doesn't check the duplicate node when inserting
a node, unlike rhashtable_lookup_insert_key. It may cause duplicate
assoc/transport in rhashtable. like:

 client (addr A, B)                 server (addr X, Y)
    connect to X           INIT (1)
                        ------------>
    connect to Y           INIT (2)
                        ------------>
                         INIT_ACK (1)
                        <------------
                         INIT_ACK (2)
                        <------------

After sending INIT (2), one transport will be created and hashed into
rhashtable. But when receiving INIT_ACK (1) and processing the address
params, another transport will be created and hashed into rhashtable
with the same addr Y and EP as the last transport. This will confuse
the assoc/transport's lookup.

This patch is to fix it by returning err if any duplicate node exists
before inserting it.

Fixes: 7fda702f ("sctp: use new rhlist interface on sctp transport rhashtable")
Reported-by: NFabio M. Di Nitto <fdinitto@redhat.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd2b7087

29 12月, 2016 1 次提交

sctp: add pr_debug for tracking asocs not found · b77b7565

由 Marcelo Ricardo Leitner 提交于 12月 28, 2016

This pr_debug may help identify why the system is generating some
Aborts. It's not something a sysadmin would be expected to use.
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b77b7565

17 11月, 2016 1 次提交

sctp: use new rhlist interface on sctp transport rhashtable · 7fda702f

由 Xin Long 提交于 11月 15, 2016

Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key
to hash a node to one chain. If in one host thousands of assocs connect
to one server with the same lport and different laddrs (although it's
not a normal case), all the transports would be hashed into the same
chain.

It may cause to keep returning -EBUSY when inserting a new node, as the
chain is too long and sctp inserts a transport node in a loop, which
could even lead to system hangs there.

The new rhlist interface works for this case that there are many nodes
with the same key in one chain. It puts them into a list then makes this
list be as a node of the chain.

This patch is to replace rhashtable_ interface with rhltable_ interface.
Since a chain would not be too long and it would not return -EBUSY with
this fix when inserting a node, the reinsert loop is also removed here.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7fda702f

01 11月, 2016 2 次提交

sctp: hold transport instead of assoc when lookup assoc in rx path · dae399d7

由 Xin Long 提交于 10月 31, 2016

Prior to this patch, in rx path, before calling lock_sock, it needed to
hold assoc when got it by __sctp_lookup_association, in case other place
would free/put assoc.

But in __sctp_lookup_association, it lookup and hold transport, then got
assoc by transport->assoc, then hold assoc and put transport. It means
it didn't hold transport, yet it was returned and later on directly
assigned to chunk->transport.

Without the protection of sock lock, the transport may be freed/put by
other places, which would cause a use-after-free issue.

This patch is to fix this issue by holding transport instead of assoc.
As holding transport can make sure to access assoc is also safe, and
actually it looks up assoc by searching transport rhashtable, to hold
transport here makes more sense.

Note that the function will be renamed later on on another patch.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dae399d7

sctp: return back transport in __sctp_rcv_init_lookup · 7c17fcc7

由 Xin Long 提交于 10月 31, 2016

Prior to this patch, it used a local variable to save the transport that is
looked up by __sctp_lookup_association(), and didn't return it back. But in
sctp_rcv, it is used to initialize chunk->transport. So when hitting this,
even if it found the transport, it was still initializing chunk->transport
with null instead.

This patch is to return the transport back through transport pointer
that is from __sctp_rcv_lookup_harder().
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c17fcc7

22 9月, 2016 1 次提交

sctp: rename WORD_TRUNC/ROUND macros · e2f036a9

由 Marcelo Ricardo Leitner 提交于 9月 21, 2016

To something more meaningful these days, specially because this is
working on packet headers or lengths and which are not tied to any CPU
arch but to the protocol itself.

So, WORD_TRUNC becomes SCTP_TRUNC4 and WORD_ROUND becomes SCTP_PAD4.
Reported-by: NDavid Laight <David.Laight@ACULAB.COM>
Reported-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2f036a9

13 9月, 2016 1 次提交

sctp: hold the transport before using it in sctp_hash_cmp · 715f5552

由 Xin Long 提交于 9月 10, 2016

Since commit 4f008781 ("sctp: apply rhashtable api to send/recv
path"), sctp uses transport rhashtable with .obj_cmpfn sctp_hash_cmp,
in which it compares the members of the transport with the rhashtable
args to check if it's the right transport.

But sctp uses the transport without holding it in sctp_hash_cmp, it can
cause a use-after-free panic. As after it gets transport from hashtable,
another CPU may close the sk and free the asoc. In sctp_association_free,
it frees all the transports, meanwhile, the assoc's refcnt may be reduced
to 0, assoc can be destroyed by sctp_association_destroy.

So after that, transport->assoc is actually an unavailable memory address
in sctp_hash_cmp. Although sctp_hash_cmp is under rcu_read_lock, it still
can not avoid this, as assoc is not freed by RCU.

This patch is to hold the transport before checking it's members with
sctp_transport_hold, in which it checks the refcnt first, holds it if
it's not 0.

Fixes: 4f008781 ("sctp: apply rhashtable api to send/recv path")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

715f5552

20 8月, 2016 1 次提交

sctp: linearize early if it's not GSO · 4c2f2454

由 Marcelo Ricardo Leitner 提交于 8月 18, 2016

Because otherwise when crc computation is still needed it's way more
expensive than on a linear buffer to the point that it affects
performance.

It's so expensive that netperf test gives a perf output as below:

Overhead Command Shared Object Symbol
18,62% netserver [kernel.vmlinux] [k] crc32_generic_shift
2,57% netserver [kernel.vmlinux] [k] __pskb_pull_tail
1,94% netserver [kernel.vmlinux] [k] fib_table_lookup
1,90% netserver [kernel.vmlinux] [k] copy_user_enhanced_fast_string
1,66% swapper [kernel.vmlinux] [k] intel_idle
1,63% netserver [kernel.vmlinux] [k] _raw_spin_lock
1,59% netserver [sctp] [k] sctp_packet_transmit
1,55% netserver [kernel.vmlinux] [k] memcpy_erms
1,42% netserver [sctp] [k] sctp_rcv

# netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000
SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

212992 212992 12000 10.00 3016.42 2.88 3.78 1.874 2.462

After patch:
Overhead Command Shared Object Symbol
2,75% netserver [kernel.vmlinux] [k] memcpy_erms
2,63% netserver [kernel.vmlinux] [k] copy_user_enhanced_fast_string
2,39% netserver [kernel.vmlinux] [k] fib_table_lookup
2,04% netserver [kernel.vmlinux] [k] __pskb_pull_tail
1,91% netserver [kernel.vmlinux] [k] _raw_spin_lock
1,91% netserver [sctp] [k] sctp_packet_transmit
1,72% netserver [mlx4_en] [k] mlx4_en_process_rx_cq
1,68% netserver [sctp] [k] sctp_rcv

212992 212992 12000 10.00 3681.77 3.83 3.46 2.045 1.849

Fixes: 3acb50c1 ("sctp: delay as much as possible skb_linearize")
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c2f2454

26 7月, 2016 1 次提交

sctp: fix BH handling on socket backlog · eefc1b1d

由 Marcelo Ricardo Leitner 提交于 7月 23, 2016

Now that the backlog processing is called with BH enabled, we have to
disable BH before taking the socket lock via bh_lock_sock() otherwise
it may dead lock:

sctp_backlog_rcv()
                bh_lock_sock(sk);

                if (sock_owned_by_user(sk)) {
                        if (sk_add_backlog(sk, skb, sk->sk_rcvbuf))
                                sctp_chunk_free(chunk);
                        else
                                backloged = 1;
                } else
                        sctp_inq_push(inqueue, chunk);

                bh_unlock_sock(sk);

while sctp_inq_push() was disabling/enabling BH, but enabling BH
triggers pending softirq, which then may try to re-lock the socket in
sctp_rcv().

[  219.187215]  <IRQ>
[  219.187217]  [<ffffffff817ca3e0>] _raw_spin_lock+0x20/0x30
[  219.187223]  [<ffffffffa041888c>] sctp_rcv+0x48c/0xba0 [sctp]
[  219.187225]  [<ffffffff816e7db2>] ? nf_iterate+0x62/0x80
[  219.187226]  [<ffffffff816f1b14>] ip_local_deliver_finish+0x94/0x1e0
[  219.187228]  [<ffffffff816f1e1f>] ip_local_deliver+0x6f/0xf0
[  219.187229]  [<ffffffff816f1a80>] ? ip_rcv_finish+0x3b0/0x3b0
[  219.187230]  [<ffffffff816f17a8>] ip_rcv_finish+0xd8/0x3b0
[  219.187232]  [<ffffffff816f2122>] ip_rcv+0x282/0x3a0
[  219.187233]  [<ffffffff810d8bb6>] ? update_curr+0x66/0x180
[  219.187235]  [<ffffffff816abac4>] __netif_receive_skb_core+0x524/0xa90
[  219.187236]  [<ffffffff810d8e00>] ? update_cfs_shares+0x30/0xf0
[  219.187237]  [<ffffffff810d557c>] ? __enqueue_entity+0x6c/0x70
[  219.187239]  [<ffffffff810dc454>] ? enqueue_entity+0x204/0xdf0
[  219.187240]  [<ffffffff816ac048>] __netif_receive_skb+0x18/0x60
[  219.187242]  [<ffffffff816ad1ce>] process_backlog+0x9e/0x140
[  219.187243]  [<ffffffff816ac8ec>] net_rx_action+0x22c/0x370
[  219.187245]  [<ffffffff817cd352>] __do_softirq+0x112/0x2e7
[  219.187247]  [<ffffffff817cc3bc>] do_softirq_own_stack+0x1c/0x30
[  219.187247]  <EOI>
[  219.187248]  [<ffffffff810aa1c8>] do_softirq.part.14+0x38/0x40
[  219.187249]  [<ffffffff810aa24d>] __local_bh_enable_ip+0x7d/0x80
[  219.187254]  [<ffffffffa0408428>] sctp_inq_push+0x68/0x80 [sctp]
[  219.187258]  [<ffffffffa04190f1>] sctp_backlog_rcv+0x151/0x1c0 [sctp]
[  219.187260]  [<ffffffff81692b07>] __release_sock+0x87/0xf0
[  219.187261]  [<ffffffff81692ba0>] release_sock+0x30/0xa0
[  219.187265]  [<ffffffffa040e46d>] sctp_accept+0x17d/0x210 [sctp]
[  219.187266]  [<ffffffff810e7510>] ? prepare_to_wait_event+0xf0/0xf0
[  219.187268]  [<ffffffff8172d52c>] inet_accept+0x3c/0x130
[  219.187269]  [<ffffffff8168d7a3>] SYSC_accept4+0x103/0x210
[  219.187271]  [<ffffffff817ca2ba>] ? _raw_spin_unlock_bh+0x1a/0x20
[  219.187272]  [<ffffffff81692bfc>] ? release_sock+0x8c/0xa0
[  219.187276]  [<ffffffffa0413e22>] ? sctp_inet_listen+0x62/0x1b0 [sctp]
[  219.187277]  [<ffffffff8168f2d0>] SyS_accept+0x10/0x20

Fixes: 860fbbc3 ("sctp: prepare for socket backlog behavior change")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eefc1b1d

19 7月, 2016 1 次提交

sctp: load transport header after sk_filter · c74bfbdb

由 Willem de Bruijn 提交于 7月 16, 2016

Do not cache pointers into the skb linear segment across sk_filter.
The function call can trigger pskb_expand_head.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c74bfbdb

14 7月, 2016 2 次提交

sctp: avoid identifying address family many times for a chunk · e7487c86

由 Marcelo Ricardo Leitner 提交于 7月 13, 2016

Identifying address family operations during rx path is not something
expensive but it's ugly to the eye to have it done multiple times,
specially when we already validated it during initial rx processing.

This patch takes advantage of the now shared sctp_input_cb and make the
pointer to the operations readily available.
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7487c86

sctp: allow others to use sctp_input_cb · 9e238323

由 Marcelo Ricardo Leitner 提交于 7月 13, 2016

We process input path in other files too and having access to it is
nice, so move it to a header where it's shared.
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e238323

04 6月, 2016 2 次提交

sctp: Add GSO support · 90017acc

由 Marcelo Ricardo Leitner 提交于 6月 02, 2016

SCTP has this pecualiarity that its packets cannot be just segmented to
(P)MTU. Its chunks must be contained in IP segments, padding respected.
So we can't just generate a big skb, set gso_size to the fragmentation
point and deliver it to IP layer.

This patch takes a different approach. SCTP will now build a skb as it
would be if it was received using GRO. That is, there will be a cover
skb with protocol headers and children ones containing the actual
segments, already segmented to a way that respects SCTP RFCs.

With that, we can tell skb_segment() to just split based on frag_list,
trusting its sizes are already in accordance.

This way SCTP can benefit from GSO and instead of passing several
packets through the stack, it can pass a single large packet.

v2:
- Added support for receiving GSO frames, as requested by Dave Miller.
- Clear skb->cb if packet is GSO (otherwise it's not used by SCTP)
- Added heuristics similar to what we have in TCP for not generating
  single GSO packets that fills cwnd.
v3:
- consider sctphdr size in skb_gso_transport_seglen()
- rebased due to 5c7cdf33 ("gso: Remove arbitrary checks for
  unsupported GSO")
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90017acc

sctp: delay as much as possible skb_linearize · 3acb50c1

由 Marcelo Ricardo Leitner 提交于 6月 02, 2016

This patch is a preparation for the GSO one. In order to successfully
handle GSO packets on rx path we must not call skb_linearize, otherwise
it defeats any gain GSO may have had.

This patch thus delays as much as possible the call to skb_linearize,
leaving it to sctp_inq_pop() moment. For that the sanity checks
performed now know how to deal with fragments.

One positive side-effect of this is that if the socket is backlogged it
will have the chance of doing it on backlog processing instead of
during softirq.

With this move, it's evident that a check for non-linearity in
sctp_inq_pop was ineffective and is now removed. Note that a similar
check is performed a bit below this one.
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3acb50c1

28 4月, 2016 3 次提交

net: rename NET_{ADD|INC}_STATS_BH() · 02a1d6e7

由 Eric Dumazet 提交于 4月 27, 2016

Rename NET_INC_STATS_BH() to __NET_INC_STATS()
and NET_ADD_STATS_BH() to __NET_ADD_STATS()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02a1d6e7

net: sctp: rename SCTP_INC_STATS_BH() · 08e3baef

由 Eric Dumazet 提交于 4月 27, 2016

Rename SCTP_INC_STATS_BH() to __SCTP_INC_STATS()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08e3baef

net: rename ICMP_INC_STATS_BH() · 5d3848bc

由 Eric Dumazet 提交于 4月 27, 2016

Rename ICMP_INC_STATS_BH() to __ICMP_INC_STATS()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d3848bc

21 3月, 2016 1 次提交

sctp: align MTU to a word · 3822a5ff

由 Marcelo Ricardo Leitner 提交于 3月 19, 2016

SCTP is a protocol that is aligned to a word (4 bytes). Thus using bare
MTU can sometimes return values that are not aligned, like for loopback,
which is 65536 but ipv4_mtu() limits that to 65535. This mis-alignment
will cause the last non-aligned bytes to never be used and can cause
issues with congestion control.

So it's better to just consider a lower MTU and keep congestion control
calcs saner as they are based on PMTU.

Same applies to icmp frag needed messages, which is also fixed by this
patch.

One other effect of this is the inability to send MTU-sized packet
without queueing or fragmentation and without hitting Nagle. As the
check performed at sctp_packet_can_append_data():

if (chunk->skb->len + q->out_qlen >= transport->pathmtu - packet->overhead)
	/* Enough data queued to fill a packet */
	return SCTP_XMIT_OK;

with the above example of MTU, if there are no other messages queued,
one cannot send a packet that just fits one packet (65532 bytes) and
without causing DATA chunk fragmentation or a delay.

v2:
 - Added WORD_TRUNC macro
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3822a5ff

14 3月, 2016 1 次提交

sctp: allow sctp_transmit_packet and others to use gfp · cea8768f

由 Marcelo Ricardo Leitner 提交于 3月 10, 2016

Currently sctp_sendmsg() triggers some calls that will allocate memory
with GFP_ATOMIC even when not necessary. In the case of
sctp_packet_transmit it will allocate a linear skb that will be used to
construct the packet and this may cause sends to fail due to ENOMEM more
often than anticipated specially with big MTUs.

This patch thus allows it to inherit gfp flags from upper calls so that
it can use GFP_KERNEL if it was triggered by a sctp_sendmsg call or
similar. All others, like retransmits or flushes started from BH, are
still allocated using GFP_ATOMIC.

In netperf tests this didn't result in any performance drawbacks when
memory is not too fragmented and made it trigger ENOMEM way less often.
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cea8768f

18 2月, 2016 1 次提交

sctp: move rcu_read_lock from __sctp_lookup_association to sctp_lookup_association · f46c7011

由 Xin Long 提交于 2月 15, 2016

__sctp_lookup_association() is only invoked by sctp_v4_err() and
sctp_rcv(), both which run on the rx BH, and it has been protected
by rcu_read_lock [see ip_local_deliver_finish() / ipv6_rcv()].

So we can move it to sctp_lookup_association, only let
sctp_lookup_association use rcu_read_lock.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f46c7011

29 1月, 2016 1 次提交

sctp: fix the transport dead race check by using atomic_add_unless on refcnt · 1eed6779

由 Xin Long 提交于 1月 22, 2016

Now when __sctp_lookup_association is running in BH, it will try to
check if t->dead is set, but meanwhile other CPUs may be freeing this
transport and this assoc and if it happens that
__sctp_lookup_association checked t->dead a bit too early, it may think
that the association is still good while it was already freed.

So we fix this race by using atomic_add_unless in sctp_transport_hold.
After we get one transport from hashtable, we will hold it only when
this transport's refcnt is not 0, so that we can make sure t->asoc
cannot be freed before we hold the asoc again.

Note that sctp association is not freed using RCU so we can't use
atomic_add_unless() with it as it may just be too late for that either.

Fixes: 4f008781 ("sctp: apply rhashtable api to send/recv path")
Reported-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1eed6779

18 1月, 2016 1 次提交

sctp: the temp asoc's transports should not be hashed/unhashed · dd7445ad

由 Xin Long 提交于 1月 16, 2016

Re-establish the previous behavior and avoid hashing temporary asocs by
checking t->asoc->temp in sctp_(un)hash_transport. Also, remove the
check of t->asoc->temp in __sctp_lookup_association, since they are
never hashed now.

Fixes: 4f008781 ("sctp: apply rhashtable api to send/recv path")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reported-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dd7445ad

16 1月, 2016 1 次提交

sctp: support to lookup with ep+paddr in transport rhashtable · 65a5124a

由 Xin Long 提交于 1月 14, 2016

Now, when we sendmsg, we translate the ep to laddr by selecting the
first element of the list, and then do a lookup for a transport.

But sctp_hash_cmp() will compare it against asoc addr_list, which may
be a subset of ep addr_list, meaning that this chosen laddr may not be
there, and thus making it impossible to find the transport.

So we fix it by using ep + paddr to lookup transports in hashtable. In
sctp_hash_cmp, if .ep is set, we will check if this ep == asoc->ep,
or we will do the laddr check.

Fixes: d6c0256a ("sctp: add the rhashtable apis for sctp global transport hashtable")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reported-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65a5124a

06 1月, 2016 3 次提交

sctp: drop the old assoc hashtable of sctp · b5eff712

由 Xin Long 提交于 12月 30, 2015

transport hashtable will replace the association hashtable,
so association hashtable is not used in sctp any more, so
drop the codes about that.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5eff712

sctp: apply rhashtable api to send/recv path · 4f008781

由 Xin Long 提交于 12月 30, 2015

apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc
and __sctp_lookup_association, it's invoked in the protection of sock
lock, it will be safe, but sctp_lookup_association need to call
rcu_read_lock() and to detect the t->dead to protect it.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f008781

sctp: add the rhashtable apis for sctp global transport hashtable · d6c0256a

由 Xin Long 提交于 12月 30, 2015

tranport hashtbale will replace the association hashtable to do the
lookup for transport, and then get association by t->assoc, rhashtable
apis will be used because of it's resizable, scalable and using rcu.

lport + rport + paddr will be the base hashkey to locate the chain,
with net to protect one netns from another, then plus the laddr to
compare to get the target.

this patch will provider the lookup functions:
- sctp_epaddr_lookup_transport
- sctp_addrs_lookup_transport

hash/unhash functions:
- sctp_hash_transport
- sctp_unhash_transport

init/destroy functions:
- sctp_transport_hashtable_init
- sctp_transport_hashtable_destroy
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6c0256a

30 8月, 2014 1 次提交

sctp: Change sctp to implement csum_levels · 202863fe

由 Tom Herbert 提交于 8月 27, 2014

CHECKSUM_UNNECESSARY may be applied to the SCTP CRC so we need to
appropriate account for this by decrementing csum_level. This is
done by calling __skb_dec_checksum_unnecessary.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

202863fe

01 8月, 2014 1 次提交

net: fix the counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS · 7304fe46

由 Duan Jiong 提交于 7月 31, 2014

When dealing with ICMPv[46] Error Message, function icmp_socket_deliver()
and icmpv6_notify() do some valid checks on packet's length, but then some
protocols check packet's length redaudantly. So remove those duplicated
statements, and increase counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS in
function icmp_socket_deliver() and icmpv6_notify() respectively.

In addition, add missed counter in udp6/udplite6 when socket is NULL.
Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7304fe46

22 1月, 2014 2 次提交

sctp: remove macros sctp_bh_[un]lock_sock · 5bc1d1b4

由 wangweidong 提交于 1月 21, 2014

Redefined bh_[un]lock_sock to sctp_bh[un]lock_sock for user
space friendly code which we haven't use in years, so removing them.
Signed-off-by: NWang Weidong <wangweidong1@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5bc1d1b4

sctp: remove macros sctp_write_[un]_lock · 387602df

由 wangweidong 提交于 1月 21, 2014

Redefined write_[un]lock to sctp_write_[un]lock for user space
friendly code which we haven't use in years, so removing them.
Signed-off-by: NWang Weidong <wangweidong1@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

387602df

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功