提交 be25f43a 编写于 作者: J Jakub Kicinski

Merge branch 'sctp-implement-rfc6951-udp-encapsulation-of-sctp'

Xin Long says:

====================
sctp: Implement RFC6951: UDP Encapsulation of SCTP

Description From the RFC:

   The Main Reasons:

   o  To allow SCTP traffic to pass through legacy NATs, which do not
      provide native SCTP support as specified in [BEHAVE] and
      [NATSUPP].

   o  To allow SCTP to be implemented on hosts that do not provide
      direct access to the IP layer.  In particular, applications can
      use their own SCTP implementation if the operating system does not
      provide one.

   Implementation Notes:

   UDP-encapsulated SCTP is normally communicated between SCTP stacks
   using the IANA-assigned UDP port number 9899 (sctp-tunneling) on both
   ends.  There are circumstances where other ports may be used on
   either end, and it might be required to use ports other than the
   registered port.

   Each SCTP stack uses a single local UDP encapsulation port number as
   the destination port for all its incoming SCTP packets, this greatly
   simplifies implementation design.

   An SCTP implementation supporting UDP encapsulation MUST maintain a
   remote UDP encapsulation port number per destination address for each
   SCTP association.  Again, because the remote stack may be using ports
   other than the well-known port, each port may be different from each
   stack.  However, because of remapping of ports by NATs, the remote
   ports associated with different remote IP addresses may not be
   identical, even if they are associated with the same stack.

   Because the well-known port might not be used, implementations need
   to allow other port numbers to be specified as a local or remote UDP
   encapsulation port number through APIs.

Patches:

   This patchset is using the udp4/6 tunnel APIs to implement the UDP
   Encapsulation of SCTP with not much change in SCTP protocol stack
   and with all current SCTP features keeped in Linux Kernel.

   1 - 4: Fix some UDP issues that may be triggered by SCTP over UDP.
   5 - 7: Process incoming UDP encapsulated packets and ICMP packets.
   8 -10: Remote encap port's update by sysctl, sockopt and packets.
   11-14: Process outgoing pakects with UDP encapsulated and its GSO.
   15-16: Add the part from draft-tuexen-tsvwg-sctp-udp-encaps-cons-03.
      17: Enable this feature.

Tests:

  - lksctp-tools/src/func_tests with UDP Encapsulation enabled/disabled:

      Both make v4test and v6test passed.

  - sctp-tests with UDP Encapsulation enabled/disabled:

      repeatability/procdumps/sctpdiag/gsomtuchange/extoverflow/
      sctphashtable passed. Others failed as expected due to those
      "iptables -p sctp" rules.

  - netperf on lo/netns/virtio_net, with gso enabled/disabled and
    with ip_checksum enabled/disabled, with UDP Encapsulation
    enabled/disabled:

      No clear performance dropped.

v1->v2:
  - Fix some incorrect code in the patches 5,6,8,10,11,13,14,17, suggested
    by Marcelo.
  - Append two patches 15-16 to add the Additional Considerations for UDP
    Encapsulation of SCTP from draft-tuexen-tsvwg-sctp-udp-encaps-cons-03.
v2->v3:
  - remove the cleanup code in patch 2, suggested by Willem.
  - remove the patch 3 and fix the checksum in the new patch 3 after
    talking with Paolo, Marcelo and Guillaume.
  - add 'select NET_UDP_TUNNEL' in patch 4 to solve a compiling error.
  - fix __be16 type cast warning in patch 8.
  - fix the wrong endian orders when setting values in 14,16.
v3->v4:
  - add entries in ip-sysctl.rst in patch 7,16, as Marcelo Suggested.
  - not create udp socks when udp_port is set to 0 in patch 16, as
    Marcelo noticed.
v4->v5:
  - improve the description for udp_port and encap_port entries in patch
    7, 16.
  - use 0 as the default udp_port.
====================

Link: https://lore.kernel.org/r/cover.1603955040.git.lucien.xin@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
......@@ -2642,6 +2642,37 @@ addr_scope_policy - INTEGER
Default: 1
udp_port - INTEGER
The listening port for the local UDP tunneling sock. Normally it's
using the IANA-assigned UDP port number 9899 (sctp-tunneling).
This UDP sock is used for processing the incoming UDP-encapsulated
SCTP packets (from RFC6951), and shared by all applications in the
same net namespace. This UDP sock will be closed when the value is
set to 0.
The value will also be used to set the src port of the UDP header
for the outgoing UDP-encapsulated SCTP packets. For the dest port,
please refer to 'encap_port' below.
Default: 0
encap_port - INTEGER
The default remote UDP encapsulation port.
This value is used to set the dest port of the UDP header for the
outgoing UDP-encapsulated SCTP packets by default. Users can also
change the value for each sock/asoc/transport by using setsockopt.
For further information, please refer to RFC6951.
Note that when connecting to a remote server, the client should set
this to the port that the UDP tunneling sock on the peer server is
listening to and the local UDP tunneling sock on the client also
must be started. On the server, it would get the encap_port from
the incoming packet's source port.
Default: 0
``/proc/sys/net/core/*``
========================
......
......@@ -482,11 +482,13 @@ enum sctp_error {
* 11 Restart of an association with new addresses
* 12 User Initiated Abort
* 13 Protocol Violation
* 14 Restart of an Association with New Encapsulation Port
*/
SCTP_ERROR_RESTART = cpu_to_be16(0x0b),
SCTP_ERROR_USER_ABORT = cpu_to_be16(0x0c),
SCTP_ERROR_PROTO_VIOLATION = cpu_to_be16(0x0d),
SCTP_ERROR_NEW_ENCAP_PORT = cpu_to_be16(0x0e),
/* ADDIP Section 3.3 New Error Causes
*
......@@ -793,4 +795,22 @@ enum {
SCTP_FLOWLABEL_VAL_MASK = 0xfffff
};
/* UDP Encapsulation
* draft-tuexen-tsvwg-sctp-udp-encaps-cons-03.html#section-4-4
*
* The error cause indicating an "Restart of an Association with
* New Encapsulation Port"
*
* 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* | Cause Code = 14 | Cause Length = 8 |
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* | Current Encapsulation Port | New Encapsulation Port |
* +-------------------------------+-------------------------------+
*/
struct sctp_new_encap_port_hdr {
__be16 cur_port;
__be16 new_port;
};
#endif /* __LINUX_SCTP_H__ */
......@@ -22,6 +22,14 @@ struct netns_sctp {
*/
struct sock *ctl_sock;
/* UDP tunneling listening sock. */
struct sock *udp4_sock;
struct sock *udp6_sock;
/* UDP tunneling listening port. */
int udp_port;
/* UDP tunneling remote encap port. */
int encap_port;
/* This is the global local address list.
* We actively maintain this complete list of addresses on
* the system by catching address add/delete events.
......
......@@ -286,6 +286,8 @@ enum { SCTP_MAX_GABS = 16 };
* functions simpler to write.
*/
#define SCTP_DEFAULT_UDP_PORT 9899 /* default UDP tunneling port */
/* These are the values for pf exposure, UNUSED is to keep compatible with old
* applications by default.
*/
......
......@@ -84,6 +84,8 @@ int sctp_copy_local_addr_list(struct net *net, struct sctp_bind_addr *addr,
struct sctp_pf *sctp_get_pf_specific(sa_family_t family);
int sctp_register_pf(struct sctp_pf *, sa_family_t);
void sctp_addr_wq_mgmt(struct net *, struct sctp_sockaddr_entry *, int);
int sctp_udp_sock_start(struct net *net);
void sctp_udp_sock_stop(struct net *net);
/*
* sctp/socket.c
......@@ -576,10 +578,13 @@ static inline __u32 sctp_mtu_payload(const struct sctp_sock *sp,
{
__u32 overhead = sizeof(struct sctphdr) + extra;
if (sp)
if (sp) {
overhead += sp->pf->af->net_header_len;
else
if (sp->udp_port)
overhead += sizeof(struct udphdr);
} else {
overhead += sizeof(struct ipv6hdr);
}
if (WARN_ON_ONCE(mtu && mtu <= overhead))
mtu = overhead;
......
......@@ -221,6 +221,9 @@ struct sctp_chunk *sctp_make_violation_paramlen(
struct sctp_chunk *sctp_make_violation_max_retrans(
const struct sctp_association *asoc,
const struct sctp_chunk *chunk);
struct sctp_chunk *sctp_make_new_encap_port(
const struct sctp_association *asoc,
const struct sctp_chunk *chunk);
struct sctp_chunk *sctp_make_heartbeat(const struct sctp_association *asoc,
const struct sctp_transport *transport);
struct sctp_chunk *sctp_make_heartbeat_ack(const struct sctp_association *asoc,
......@@ -380,6 +383,7 @@ sctp_vtag_verify(const struct sctp_chunk *chunk,
if (ntohl(chunk->sctp_hdr->vtag) == asoc->c.my_vtag)
return 1;
chunk->transport->encap_port = SCTP_INPUT_CB(chunk->skb)->encap_port;
return 0;
}
......
......@@ -178,6 +178,9 @@ struct sctp_sock {
*/
__u32 hbinterval;
__be16 udp_port;
__be16 encap_port;
/* This is the max_retrans value for new associations. */
__u16 pathmaxrxt;
......@@ -877,6 +880,8 @@ struct sctp_transport {
*/
unsigned long last_time_ecne_reduced;
__be16 encap_port;
/* This is the max_retrans value for the transport and will
* be initialized from the assocs value. This can be changed
* using the SCTP_SET_PEER_ADDR_PARAMS socket option.
......@@ -1116,14 +1121,9 @@ static inline void sctp_outq_cork(struct sctp_outq *q)
* sctp_input_cb is currently used on rx and sock rx queue
*/
struct sctp_input_cb {
union {
struct inet_skb_parm h4;
#if IS_ENABLED(CONFIG_IPV6)
struct inet6_skb_parm h6;
#endif
} header;
struct sctp_chunk *chunk;
struct sctp_af *af;
__be16 encap_port;
};
#define SCTP_INPUT_CB(__skb) ((struct sctp_input_cb *)&((__skb)->cb[0]))
......@@ -1790,6 +1790,8 @@ struct sctp_association {
*/
unsigned long hbinterval;
__be16 encap_port;
/* This is the max_retrans value for new transports in the
* association.
*/
......
......@@ -140,6 +140,7 @@ typedef __s32 sctp_assoc_t;
#define SCTP_ECN_SUPPORTED 130
#define SCTP_EXPOSE_POTENTIALLY_FAILED_STATE 131
#define SCTP_EXPOSE_PF_STATE SCTP_EXPOSE_POTENTIALLY_FAILED_STATE
#define SCTP_REMOTE_UDP_ENCAPS_PORT 132
/* PR-SCTP policies */
#define SCTP_PR_SCTP_NONE 0x0000
......@@ -1197,6 +1198,12 @@ struct sctp_event {
uint8_t se_on;
};
struct sctp_udpencaps {
sctp_assoc_t sue_assoc_id;
struct sockaddr_storage sue_address;
uint16_t sue_port;
};
/* SCTP Stream schedulers */
enum sctp_sched_type {
SCTP_SS_FCFS,
......
......@@ -702,7 +702,7 @@ int __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable)
sk = __udp4_lib_lookup(net, iph->daddr, uh->dest,
iph->saddr, uh->source, skb->dev->ifindex,
inet_sdif(skb), udptable, NULL);
if (!sk) {
if (!sk || udp_sk(sk)->encap_type) {
/* No socket for error: try tunnels before discarding */
sk = ERR_PTR(-ENOENT);
if (static_branch_unlikely(&udp_encap_needed_key)) {
......
......@@ -49,6 +49,7 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
__skb_pull(skb, tnl_hlen);
skb_reset_mac_header(skb);
skb_set_network_header(skb, skb_inner_network_offset(skb));
skb_set_transport_header(skb, skb_inner_transport_offset(skb));
skb->mac_len = skb_inner_network_offset(skb);
skb->protocol = new_protocol;
......@@ -67,6 +68,8 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
(NETIF_F_HW_CSUM | NETIF_F_IP_CSUM))));
features &= skb->dev->hw_enc_features;
/* CRC checksum can't be handled by HW when it's a UDP tunneling packet. */
features &= ~NETIF_F_SCTP_CRC;
/* The only checksum offload we care about from here on out is the
* outer one so strip the existing checksum feature flags and
......
......@@ -560,7 +560,7 @@ int __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
sk = __udp6_lib_lookup(net, daddr, uh->dest, saddr, uh->source,
inet6_iif(skb), inet6_sdif(skb), udptable, NULL);
if (!sk) {
if (!sk || udp_sk(sk)->encap_type) {
/* No socket for error: try tunnels before discarding */
sk = ERR_PTR(-ENOENT);
if (static_branch_unlikely(&udpv6_encap_needed_key)) {
......
......@@ -28,10 +28,6 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
int tnl_hlen;
int err;
mss = skb_shinfo(skb)->gso_size;
if (unlikely(skb->len <= mss))
goto out;
if (skb->encapsulation && skb_shinfo(skb)->gso_type &
(SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM))
segs = skb_udp_tunnel_segment(skb, features, true);
......@@ -48,6 +44,10 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
return __udp_gso_segment(skb, features);
mss = skb_shinfo(skb)->gso_size;
if (unlikely(skb->len <= mss))
goto out;
/* Do software UFO. Complete and fill in the UDP checksum as HW cannot
* do checksum of UDP packets sent as multiple IP fragments.
*/
......
......@@ -11,6 +11,7 @@ menuconfig IP_SCTP
select CRYPTO_HMAC
select CRYPTO_SHA1
select LIBCRC32C
select NET_UDP_TUNNEL
help
Stream Control Transmission Protocol
......
......@@ -99,6 +99,8 @@ static struct sctp_association *sctp_association_init(
*/
asoc->hbinterval = msecs_to_jiffies(sp->hbinterval);
asoc->encap_port = sp->encap_port;
/* Initialize path max retrans value. */
asoc->pathmaxrxt = sp->pathmaxrxt;
......@@ -624,6 +626,8 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
*/
peer->hbinterval = asoc->hbinterval;
peer->encap_port = asoc->encap_port;
/* Set the path max_retrans. */
peer->pathmaxrxt = asoc->pathmaxrxt;
......
......@@ -55,6 +55,7 @@
#include <net/inet_common.h>
#include <net/inet_ecn.h>
#include <net/sctp/sctp.h>
#include <net/udp_tunnel.h>
#include <linux/uaccess.h>
......@@ -191,33 +192,53 @@ static int sctp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
return ret;
}
static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *transport)
static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *t)
{
struct dst_entry *dst = dst_clone(t->dst);
struct flowi6 *fl6 = &t->fl.u.ip6;
struct sock *sk = skb->sk;
struct ipv6_pinfo *np = inet6_sk(sk);
struct flowi6 *fl6 = &transport->fl.u.ip6;
__u8 tclass = np->tclass;
int res;
__be32 label;
pr_debug("%s: skb:%p, len:%d, src:%pI6 dst:%pI6\n", __func__, skb,
skb->len, &fl6->saddr, &fl6->daddr);
if (transport->dscp & SCTP_DSCP_SET_MASK)
tclass = transport->dscp & SCTP_DSCP_VAL_MASK;
if (t->dscp & SCTP_DSCP_SET_MASK)
tclass = t->dscp & SCTP_DSCP_VAL_MASK;
if (INET_ECN_is_capable(tclass))
IP6_ECN_flow_xmit(sk, fl6->flowlabel);
if (!(transport->param_flags & SPP_PMTUD_ENABLE))
if (!(t->param_flags & SPP_PMTUD_ENABLE))
skb->ignore_df = 1;
SCTP_INC_STATS(sock_net(sk), SCTP_MIB_OUTSCTPPACKS);
rcu_read_lock();
res = ip6_xmit(sk, skb, fl6, sk->sk_mark, rcu_dereference(np->opt),
tclass, sk->sk_priority);
rcu_read_unlock();
return res;
if (!t->encap_port || !sctp_sk(sk)->udp_port) {
int res;
skb_dst_set(skb, dst);
rcu_read_lock();
res = ip6_xmit(sk, skb, fl6, sk->sk_mark,
rcu_dereference(np->opt),
tclass, sk->sk_priority);
rcu_read_unlock();
return res;
}
if (skb_is_gso(skb))
skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
skb->encapsulation = 1;
skb_reset_inner_mac_header(skb);
skb_reset_inner_transport_header(skb);
skb_set_inner_ipproto(skb, IPPROTO_SCTP);
label = ip6_make_flowlabel(sock_net(sk), skb, fl6->flowlabel, true, fl6);
return udp_tunnel6_xmit_skb(dst, sk, skb, NULL, &fl6->saddr,
&fl6->daddr, tclass, ip6_dst_hoplimit(dst),
label, sctp_sk(sk)->udp_port, t->encap_port, false);
}
/* Returns the dst cache entry for the given source and destination ip
......@@ -1053,6 +1074,7 @@ static struct inet_protosw sctpv6_stream_protosw = {
static int sctp6_rcv(struct sk_buff *skb)
{
memset(skb->cb, 0, sizeof(skb->cb));
return sctp_rcv(skb) ? -1 : 0;
}
......
......@@ -27,7 +27,11 @@ static __le32 sctp_gso_make_checksum(struct sk_buff *skb)
{
skb->ip_summed = CHECKSUM_NONE;
skb->csum_not_inet = 0;
gso_reset_checksum(skb, ~0);
/* csum and csum_start in GSO CB may be needed to do the UDP
* checksum when it's a UDP tunneling packet.
*/
SKB_GSO_CB(skb)->csum = (__force __wsum)~0;
SKB_GSO_CB(skb)->csum_start = skb_headroom(skb) + skb->len;
return sctp_compute_cksum(skb, skb_transport_offset(skb));
}
......
......@@ -508,20 +508,14 @@ static int sctp_packet_pack(struct sctp_packet *packet,
sizeof(struct inet6_skb_parm)));
skb_shinfo(head)->gso_segs = pkt_count;
skb_shinfo(head)->gso_size = GSO_BY_FRAGS;
rcu_read_lock();
if (skb_dst(head) != tp->dst) {
dst_hold(tp->dst);
sk_setup_caps(sk, tp->dst);
}
rcu_read_unlock();
goto chksum;
}
if (sctp_checksum_disable)
return 1;
if (!(skb_dst(head)->dev->features & NETIF_F_SCTP_CRC) ||
dst_xfrm(skb_dst(head)) || packet->ipfragok) {
if (!(tp->dst->dev->features & NETIF_F_SCTP_CRC) ||
dst_xfrm(tp->dst) || packet->ipfragok || tp->encap_port) {
struct sctphdr *sh =
(struct sctphdr *)skb_transport_header(head);
......@@ -548,7 +542,6 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
struct sctp_association *asoc = tp->asoc;
struct sctp_chunk *chunk, *tmp;
int pkt_count, gso = 0;
struct dst_entry *dst;
struct sk_buff *head;
struct sctphdr *sh;
struct sock *sk;
......@@ -585,13 +578,18 @@ int sctp_packet_transmit(struct sctp_packet *packet, gfp_t gfp)
sh->checksum = 0;
/* drop packet if no dst */
dst = dst_clone(tp->dst);
if (!dst) {
if (!tp->dst) {
IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
kfree_skb(head);
goto out;
}
skb_dst_set(head, dst);
rcu_read_lock();
if (__sk_dst_get(sk) != tp->dst) {
dst_hold(tp->dst);
sk_setup_caps(sk, tp->dst);
}
rcu_read_unlock();
/* pack up chunks */
pkt_count = sctp_packet_pack(packet, head, gso, gfp);
......
......@@ -44,6 +44,7 @@
#include <net/addrconf.h>
#include <net/inet_common.h>
#include <net/inet_ecn.h>
#include <net/udp_tunnel.h>
#define MAX_SCTP_PORT_HASH_ENTRIES (64 * 1024)
......@@ -840,6 +841,93 @@ static int sctp_ctl_sock_init(struct net *net)
return 0;
}
static int sctp_udp_rcv(struct sock *sk, struct sk_buff *skb)
{
memset(skb->cb, 0, sizeof(skb->cb));
SCTP_INPUT_CB(skb)->encap_port = udp_hdr(skb)->source;
skb_set_transport_header(skb, sizeof(struct udphdr));
sctp_rcv(skb);
return 0;
}
static int sctp_udp_err_lookup(struct sock *sk, struct sk_buff *skb)
{
struct sctp_association *asoc;
struct sctp_transport *t;
int family;
skb->transport_header += sizeof(struct udphdr);
family = (ip_hdr(skb)->version == 4) ? AF_INET : AF_INET6;
sk = sctp_err_lookup(dev_net(skb->dev), family, skb, sctp_hdr(skb),
&asoc, &t);
if (!sk)
return -ENOENT;
sctp_err_finish(sk, t);
return 0;
}
int sctp_udp_sock_start(struct net *net)
{
struct udp_tunnel_sock_cfg tuncfg = {NULL};
struct udp_port_cfg udp_conf = {0};
struct socket *sock;
int err;
udp_conf.family = AF_INET;
udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
udp_conf.local_udp_port = htons(net->sctp.udp_port);
err = udp_sock_create(net, &udp_conf, &sock);
if (err) {
pr_err("Failed to create the SCTP UDP tunneling v4 sock\n");
return err;
}
tuncfg.encap_type = 1;
tuncfg.encap_rcv = sctp_udp_rcv;
tuncfg.encap_err_lookup = sctp_udp_err_lookup;
setup_udp_tunnel_sock(net, sock, &tuncfg);
net->sctp.udp4_sock = sock->sk;
#if IS_ENABLED(CONFIG_IPV6)
memset(&udp_conf, 0, sizeof(udp_conf));
udp_conf.family = AF_INET6;
udp_conf.local_ip6 = in6addr_any;
udp_conf.local_udp_port = htons(net->sctp.udp_port);
udp_conf.use_udp6_rx_checksums = true;
udp_conf.ipv6_v6only = true;
err = udp_sock_create(net, &udp_conf, &sock);
if (err) {
pr_err("Failed to create the SCTP UDP tunneling v6 sock\n");
udp_tunnel_sock_release(net->sctp.udp4_sock->sk_socket);
net->sctp.udp4_sock = NULL;
return err;
}
tuncfg.encap_type = 1;
tuncfg.encap_rcv = sctp_udp_rcv;
tuncfg.encap_err_lookup = sctp_udp_err_lookup;
setup_udp_tunnel_sock(net, sock, &tuncfg);
net->sctp.udp6_sock = sock->sk;
#endif
return 0;
}
void sctp_udp_sock_stop(struct net *net)
{
if (net->sctp.udp4_sock) {
udp_tunnel_sock_release(net->sctp.udp4_sock->sk_socket);
net->sctp.udp4_sock = NULL;
}
if (net->sctp.udp6_sock) {
udp_tunnel_sock_release(net->sctp.udp6_sock->sk_socket);
net->sctp.udp6_sock = NULL;
}
}
/* Register address family specific functions. */
int sctp_register_af(struct sctp_af *af)
{
......@@ -971,25 +1059,44 @@ static int sctp_inet_supported_addrs(const struct sctp_sock *opt,
}
/* Wrapper routine that calls the ip transmit routine. */
static inline int sctp_v4_xmit(struct sk_buff *skb,
struct sctp_transport *transport)
static inline int sctp_v4_xmit(struct sk_buff *skb, struct sctp_transport *t)
{
struct inet_sock *inet = inet_sk(skb->sk);
struct dst_entry *dst = dst_clone(t->dst);
struct flowi4 *fl4 = &t->fl.u.ip4;
struct sock *sk = skb->sk;
struct inet_sock *inet = inet_sk(sk);
__u8 dscp = inet->tos;
__be16 df = 0;
pr_debug("%s: skb:%p, len:%d, src:%pI4, dst:%pI4\n", __func__, skb,
skb->len, &transport->fl.u.ip4.saddr,
&transport->fl.u.ip4.daddr);
skb->len, &fl4->saddr, &fl4->daddr);
if (transport->dscp & SCTP_DSCP_SET_MASK)
dscp = transport->dscp & SCTP_DSCP_VAL_MASK;
if (t->dscp & SCTP_DSCP_SET_MASK)
dscp = t->dscp & SCTP_DSCP_VAL_MASK;
inet->pmtudisc = t->param_flags & SPP_PMTUD_ENABLE ? IP_PMTUDISC_DO
: IP_PMTUDISC_DONT;
SCTP_INC_STATS(sock_net(sk), SCTP_MIB_OUTSCTPPACKS);
inet->pmtudisc = transport->param_flags & SPP_PMTUD_ENABLE ?
IP_PMTUDISC_DO : IP_PMTUDISC_DONT;
if (!t->encap_port || !sctp_sk(sk)->udp_port) {
skb_dst_set(skb, dst);
return __ip_queue_xmit(sk, skb, &t->fl, dscp);
}
SCTP_INC_STATS(sock_net(&inet->sk), SCTP_MIB_OUTSCTPPACKS);
if (skb_is_gso(skb))
skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
return __ip_queue_xmit(&inet->sk, skb, &transport->fl, dscp);
if (ip_dont_fragment(sk, dst) && !skb->ignore_df)
df = htons(IP_DF);
skb->encapsulation = 1;
skb_reset_inner_mac_header(skb);
skb_reset_inner_transport_header(skb);
skb_set_inner_ipproto(skb, IPPROTO_SCTP);
udp_tunnel_xmit_skb((struct rtable *)dst, sk, skb, fl4->saddr,
fl4->daddr, dscp, ip4_dst_hoplimit(dst), df,
sctp_sk(sk)->udp_port, t->encap_port, false, false);
return 0;
}
static struct sctp_af sctp_af_inet;
......@@ -1054,9 +1161,15 @@ static struct inet_protosw sctp_stream_protosw = {
.flags = SCTP_PROTOSW_FLAG
};
static int sctp4_rcv(struct sk_buff *skb)
{
memset(skb->cb, 0, sizeof(skb->cb));
return sctp_rcv(skb);
}
/* Register with IP layer. */
static const struct net_protocol sctp_protocol = {
.handler = sctp_rcv,
.handler = sctp4_rcv,
.err_handler = sctp_v4_err,
.no_policy = 1,
.netns_ok = 1,
......@@ -1271,6 +1384,12 @@ static int __net_init sctp_defaults_init(struct net *net)
/* Enable ECN by default. */
net->sctp.ecn_enable = 1;
/* Set UDP tunneling listening port to 0 by default */
net->sctp.udp_port = 0;
/* Set remote encap port to 0 by default */
net->sctp.encap_port = 0;
/* Set SCOPE policy to enabled */
net->sctp.scope_policy = SCTP_SCOPE_POLICY_ENABLE;
......
......@@ -1142,6 +1142,26 @@ struct sctp_chunk *sctp_make_violation_max_retrans(
return retval;
}
struct sctp_chunk *sctp_make_new_encap_port(const struct sctp_association *asoc,
const struct sctp_chunk *chunk)
{
struct sctp_new_encap_port_hdr nep;
struct sctp_chunk *retval;
retval = sctp_make_abort(asoc, chunk,
sizeof(struct sctp_errhdr) + sizeof(nep));
if (!retval)
goto nodata;
sctp_init_cause(retval, SCTP_ERROR_NEW_ENCAP_PORT, sizeof(nep));
nep.cur_port = SCTP_INPUT_CB(chunk->skb)->encap_port;
nep.new_port = chunk->transport->encap_port;
sctp_addto_chunk(retval, sizeof(nep), &nep);
nodata:
return retval;
}
/* Make a HEARTBEAT chunk. */
struct sctp_chunk *sctp_make_heartbeat(const struct sctp_association *asoc,
const struct sctp_transport *transport)
......@@ -2321,6 +2341,7 @@ int sctp_process_init(struct sctp_association *asoc, struct sctp_chunk *chunk,
* added as the primary transport. The source address seems to
* be a better choice than any of the embedded addresses.
*/
asoc->encap_port = SCTP_INPUT_CB(chunk->skb)->encap_port;
if (!sctp_assoc_add_peer(asoc, peer_addr, gfp, SCTP_ACTIVE))
goto nomem;
......
......@@ -87,6 +87,13 @@ static enum sctp_disposition sctp_sf_tabort_8_4_8(
const union sctp_subtype type,
void *arg,
struct sctp_cmd_seq *commands);
static enum sctp_disposition sctp_sf_new_encap_port(
struct net *net,
const struct sctp_endpoint *ep,
const struct sctp_association *asoc,
const union sctp_subtype type,
void *arg,
struct sctp_cmd_seq *commands);
static struct sctp_sackhdr *sctp_sm_pull_sack(struct sctp_chunk *chunk);
static enum sctp_disposition sctp_stop_t1_and_abort(
......@@ -1493,6 +1500,10 @@ static enum sctp_disposition sctp_sf_do_unexpected_init(
if (!sctp_chunk_length_valid(chunk, sizeof(struct sctp_init_chunk)))
return sctp_sf_violation_chunklen(net, ep, asoc, type, arg,
commands);
if (SCTP_INPUT_CB(chunk->skb)->encap_port != chunk->transport->encap_port)
return sctp_sf_new_encap_port(net, ep, asoc, type, arg, commands);
/* Grab the INIT header. */
chunk->subh.init_hdr = (struct sctp_inithdr *)chunk->skb->data;
......@@ -3392,6 +3403,45 @@ static enum sctp_disposition sctp_sf_tabort_8_4_8(
sctp_packet_append_chunk(packet, abort);
sctp_add_cmd_sf(commands, SCTP_CMD_SEND_PKT, SCTP_PACKET(packet));
SCTP_INC_STATS(net, SCTP_MIB_OUTCTRLCHUNKS);
sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
return SCTP_DISPOSITION_CONSUME;
}
/* Handling of SCTP Packets Containing an INIT Chunk Matching an
* Existing Associations when the UDP encap port is incorrect.
*
* From Section 4 at draft-tuexen-tsvwg-sctp-udp-encaps-cons-03.
*/
static enum sctp_disposition sctp_sf_new_encap_port(
struct net *net,
const struct sctp_endpoint *ep,
const struct sctp_association *asoc,
const union sctp_subtype type,
void *arg,
struct sctp_cmd_seq *commands)
{
struct sctp_packet *packet = NULL;
struct sctp_chunk *chunk = arg;
struct sctp_chunk *abort;
packet = sctp_ootb_pkt_new(net, asoc, chunk);
if (!packet)
return SCTP_DISPOSITION_NOMEM;
abort = sctp_make_new_encap_port(asoc, chunk);
if (!abort) {
sctp_ootb_pkt_free(packet);
return SCTP_DISPOSITION_NOMEM;
}
abort->skb->sk = ep->base.sk;
sctp_packet_append_chunk(packet, abort);
sctp_add_cmd_sf(commands, SCTP_CMD_SEND_PKT,
SCTP_PACKET(packet));
......@@ -6268,6 +6318,8 @@ static struct sctp_packet *sctp_ootb_pkt_new(
if (!transport)
goto nomem;
transport->encap_port = SCTP_INPUT_CB(chunk->skb)->encap_port;
/* Cache a route for the transport with the chunk's destination as
* the source address.
*/
......
......@@ -4417,6 +4417,55 @@ static int sctp_setsockopt_pf_expose(struct sock *sk,
return retval;
}
static int sctp_setsockopt_encap_port(struct sock *sk,
struct sctp_udpencaps *encap,
unsigned int optlen)
{
struct sctp_association *asoc;
struct sctp_transport *t;
__be16 encap_port;
if (optlen != sizeof(*encap))
return -EINVAL;
/* If an address other than INADDR_ANY is specified, and
* no transport is found, then the request is invalid.
*/
encap_port = (__force __be16)encap->sue_port;
if (!sctp_is_any(sk, (union sctp_addr *)&encap->sue_address)) {
t = sctp_addr_id2transport(sk, &encap->sue_address,
encap->sue_assoc_id);
if (!t)
return -EINVAL;
t->encap_port = encap_port;
return 0;
}
/* Get association, if assoc_id != SCTP_FUTURE_ASSOC and the
* socket is a one to many style socket, and an association
* was not found, then the id was invalid.
*/
asoc = sctp_id2assoc(sk, encap->sue_assoc_id);
if (!asoc && encap->sue_assoc_id != SCTP_FUTURE_ASSOC &&
sctp_style(sk, UDP))
return -EINVAL;
/* If changes are for association, also apply encap_port to
* each transport.
*/
if (asoc) {
list_for_each_entry(t, &asoc->peer.transport_addr_list,
transports)
t->encap_port = encap_port;
return 0;
}
sctp_sk(sk)->encap_port = encap_port;
return 0;
}
/* API 6.2 setsockopt(), getsockopt()
*
* Applications use setsockopt() and getsockopt() to set or retrieve
......@@ -4636,6 +4685,9 @@ static int sctp_setsockopt(struct sock *sk, int level, int optname,
case SCTP_EXPOSE_POTENTIALLY_FAILED_STATE:
retval = sctp_setsockopt_pf_expose(sk, kopt, optlen);
break;
case SCTP_REMOTE_UDP_ENCAPS_PORT:
retval = sctp_setsockopt_encap_port(sk, kopt, optlen);
break;
default:
retval = -ENOPROTOOPT;
break;
......@@ -4876,6 +4928,8 @@ static int sctp_init_sock(struct sock *sk)
* be modified via SCTP_PEER_ADDR_PARAMS
*/
sp->hbinterval = net->sctp.hb_interval;
sp->udp_port = htons(net->sctp.udp_port);
sp->encap_port = htons(net->sctp.encap_port);
sp->pathmaxrxt = net->sctp.max_retrans_path;
sp->pf_retrans = net->sctp.pf_retrans;
sp->ps_retrans = net->sctp.ps_retrans;
......@@ -7790,6 +7844,65 @@ static int sctp_getsockopt_pf_expose(struct sock *sk, int len,
return retval;
}
static int sctp_getsockopt_encap_port(struct sock *sk, int len,
char __user *optval, int __user *optlen)
{
struct sctp_association *asoc;
struct sctp_udpencaps encap;
struct sctp_transport *t;
__be16 encap_port;
if (len < sizeof(encap))
return -EINVAL;
len = sizeof(encap);
if (copy_from_user(&encap, optval, len))
return -EFAULT;
/* If an address other than INADDR_ANY is specified, and
* no transport is found, then the request is invalid.
*/
if (!sctp_is_any(sk, (union sctp_addr *)&encap.sue_address)) {
t = sctp_addr_id2transport(sk, &encap.sue_address,
encap.sue_assoc_id);
if (!t) {
pr_debug("%s: failed no transport\n", __func__);
return -EINVAL;
}
encap_port = t->encap_port;
goto out;
}
/* Get association, if assoc_id != SCTP_FUTURE_ASSOC and the
* socket is a one to many style socket, and an association
* was not found, then the id was invalid.
*/
asoc = sctp_id2assoc(sk, encap.sue_assoc_id);
if (!asoc && encap.sue_assoc_id != SCTP_FUTURE_ASSOC &&
sctp_style(sk, UDP)) {
pr_debug("%s: failed no association\n", __func__);
return -EINVAL;
}
if (asoc) {
encap_port = asoc->encap_port;
goto out;
}
encap_port = sctp_sk(sk)->encap_port;
out:
encap.sue_port = (__force uint16_t)encap_port;
if (copy_to_user(optval, &encap, len))
return -EFAULT;
if (put_user(len, optlen))
return -EFAULT;
return 0;
}
static int sctp_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen)
{
......@@ -8010,6 +8123,9 @@ static int sctp_getsockopt(struct sock *sk, int level, int optname,
case SCTP_EXPOSE_POTENTIALLY_FAILED_STATE:
retval = sctp_getsockopt_pf_expose(sk, len, optval, optlen);
break;
case SCTP_REMOTE_UDP_ENCAPS_PORT:
retval = sctp_getsockopt_encap_port(sk, len, optval, optlen);
break;
default:
retval = -ENOPROTOOPT;
break;
......
......@@ -36,6 +36,7 @@ static int rto_alpha_max = 1000;
static int rto_beta_max = 1000;
static int pf_expose_max = SCTP_PF_EXPOSE_MAX;
static int ps_retrans_max = SCTP_PS_RETRANS_MAX;
static int udp_port_max = 65535;
static unsigned long max_autoclose_min = 0;
static unsigned long max_autoclose_max =
......@@ -48,6 +49,8 @@ static int proc_sctp_do_rto_min(struct ctl_table *ctl, int write,
void *buffer, size_t *lenp, loff_t *ppos);
static int proc_sctp_do_rto_max(struct ctl_table *ctl, int write, void *buffer,
size_t *lenp, loff_t *ppos);
static int proc_sctp_do_udp_port(struct ctl_table *ctl, int write, void *buffer,
size_t *lenp, loff_t *ppos);
static int proc_sctp_do_alpha_beta(struct ctl_table *ctl, int write,
void *buffer, size_t *lenp, loff_t *ppos);
static int proc_sctp_do_auth(struct ctl_table *ctl, int write,
......@@ -290,6 +293,24 @@ static struct ctl_table sctp_net_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec,
},
{
.procname = "udp_port",
.data = &init_net.sctp.udp_port,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_sctp_do_udp_port,
.extra1 = SYSCTL_ZERO,
.extra2 = &udp_port_max,
},
{
.procname = "encap_port",
.data = &init_net.sctp.encap_port,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
.extra1 = SYSCTL_ZERO,
.extra2 = &udp_port_max,
},
{
.procname = "addr_scope_policy",
.data = &init_net.sctp.scope_policy,
......@@ -477,6 +498,47 @@ static int proc_sctp_do_auth(struct ctl_table *ctl, int write,
return ret;
}
static int proc_sctp_do_udp_port(struct ctl_table *ctl, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
struct net *net = current->nsproxy->net_ns;
unsigned int min = *(unsigned int *)ctl->extra1;
unsigned int max = *(unsigned int *)ctl->extra2;
struct ctl_table tbl;
int ret, new_value;
memset(&tbl, 0, sizeof(struct ctl_table));
tbl.maxlen = sizeof(unsigned int);
if (write)
tbl.data = &new_value;
else
tbl.data = &net->sctp.udp_port;
ret = proc_dointvec(&tbl, write, buffer, lenp, ppos);
if (write && ret == 0) {
struct sock *sk = net->sctp.ctl_sock;
if (new_value > max || new_value < min)
return -EINVAL;
net->sctp.udp_port = new_value;
sctp_udp_sock_stop(net);
if (new_value) {
ret = sctp_udp_sock_start(net);
if (ret)
net->sctp.udp_port = 0;
}
/* Update the value in the control socket */
lock_sock(sk);
sctp_sk(sk)->udp_port = htons(net->sctp.udp_port);
release_sock(sk);
}
return ret;
}
int sctp_sysctl_net_register(struct net *net)
{
struct ctl_table *table;
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册