提交 · 9ca78674eb6a19acbb1d69e86273ebd1d3edf087 · openanolis / cloud-kernel

30 6月, 2018 10 次提交

net/smc: add SMC-D diag support · 4b1b7d3b

由 Hans Wippel 提交于 6月 28, 2018

This patch adds diag support for SMC-D.
Signed-off-by: NHans Wippel <hwippel@linux.ibm.com>
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Suggested-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b1b7d3b

net/smc: add SMC-D support in af_smc · 41349844

由 Hans Wippel 提交于 6月 28, 2018

This patch ties together the previous SMC-D patches. It adds support for
SMC-D to the listen and connect functions and, thus, enables SMC-D
support in the SMC code. If a connection supports both SMC-R and SMC-D,
SMC-D is preferred.
Signed-off-by: NHans Wippel <hwippel@linux.ibm.com>
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Suggested-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41349844

net/smc: add SMC-D support in data transfer · be244f28

由 Hans Wippel 提交于 6月 28, 2018

The data transfer and CDC message headers differ in SMC-R and SMC-D.
This patch adds support for the SMC-D data transfer to the existing SMC
code. It consists of the following:

* SMC-D CDC support
* SMC-D tx support
* SMC-D rx support

The CDC header is stored at the beginning of the receive buffer. Thus, a
rx_offset variable is added for the CDC header offset within the buffer
(0 for SMC-R).
Signed-off-by: NHans Wippel <hwippel@linux.ibm.com>
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Suggested-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be244f28

net/smc: add SMC-D support in CLC messages · c758dfdd

由 Hans Wippel 提交于 6月 28, 2018

There are two types of SMC: SMC-R and SMC-D. These types are signaled
within the CLC messages during the CLC handshake. This patch adds
support for and checks of the SMC type.

Also, SMC-R and SMC-D need to exchange different information during the
CLC handshake. So, this patch extends the current message formats to
support the SMC-D header fields. The Proposal message can contain both
SMC-R and SMC-D information. The Accept and Confirm messages contain
either SMC-R or SMC-D information.
Signed-off-by: NHans Wippel <hwippel@linux.ibm.com>
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Suggested-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c758dfdd

net/smc: add pnetid support for SMC-D and ISM · 1619f770

由 Hans Wippel 提交于 6月 28, 2018

SMC-D relies on PNETIDs to find usable SMC-D/ISM devices for a SMC
connection. This patch adds SMC-D/ISM support to the current PNETID
implementation.
Signed-off-by: NHans Wippel <hwippel@linux.ibm.com>
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Suggested-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1619f770

net/smc: add base infrastructure for SMC-D and ISM · c6ba7c9b

由 Hans Wippel 提交于 6月 28, 2018

SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R
uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM)
devices. An ISM device only allows shared memory communication between
SMC instances on the same machine. For example, this allows virtual
machines on the same host to communicate via SMC without RDMA devices.

This patch adds the base infrastructure for SMC-D and ISM devices to
the existing SMC code. It contains the following:

* ISM driver interface:
  This interface allows an ISM driver to register ISM devices in SMC. In
  the process, the driver provides a set of device ops for each device.
  SMC uses these ops to execute SMC specific operations on or transfer
  data over the device.

* Core SMC-D link group, connection, and buffer support:
  Link groups, SMC connections and SMC buffers (in smc_core) are
  extended to support SMC-D.

* SMC type checks:
  Some type checks are added to prevent using SMC-R specific code for
  SMC-D and vice versa.

To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are
required. These are added in follow-up patches.
Signed-off-by: NHans Wippel <hwippel@linux.ibm.com>
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Suggested-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6ba7c9b

net/smc: optimize consumer cursor updates · e82f2e31

由 Ursula Braun 提交于 6月 28, 2018

The SMC protocol requires to send a separate consumer cursor update,
if it cannot be piggybacked to updates of the producer cursor.
Currently the decision to send a separate consumer cursor update
just considers the amount of data already received by the socket
program. It does not consider the amount of data already arrived, but
not yet consumed by the receiver. Basing the decision on the
difference between already confirmed and already arrived data
(instead of difference between already confirmed and already consumed
data), may lead to a somewhat earlier consumer cursor update send in
fast unidirectional traffic scenarios, and thus to better throughput.
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Suggested-by: NThomas Richter <tmricht@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e82f2e31

net/smc: add pnetid support · 0afff91c

由 Ursula Braun 提交于 6月 28, 2018

s390 hardware supports the definition of a so-call Physical NETwork
IDentifier (short PNETID) per network device port. These PNETIDS
can be used to identify network devices that are attached to the same
physical network (broadcast domain).

On s390 try to use the PNETID of the ethernet device port used for
initial connecting, and derive the IB device port used for SMC RDMA
traffic.

On platforms without PNETID support fall back to the existing
solution of a configured pnet table.
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0afff91c

net/smc: determine port attributes independent from pnet table · be6a3f38

由 Ursula Braun 提交于 6月 28, 2018

For SMC it is important to know the current port state of RoCE devices.
Monitoring port states has been triggered, when a RoCE device was added
to the pnet table. To support future alternatives to the pnet table the
monitoring of ports is made independent of the existence of a pnet table.
It starts once the smc_ib_device is established.

Due to this change smc_ib_remember_port_attr() is now a local function
and shuffling its location and the location of its used functions
makes any forward references obsolete.

And the duplicate SMC_MAX_PORTS definition is removed.
Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be6a3f38

tcp: add new SNMP counter for drops when try to queue in rcv queue · ea5d0c32

由 Yafang Shao 提交于 6月 28, 2018

When sk_rmem_alloc is larger than the receive buffer and we can't
schedule more memory for it, the skb will be dropped.

In above situation, if this skb is put into the ofo queue,
LINUX_MIB_TCPOFODROP is incremented to track it.

While if this skb is put into the receive queue, there's no record.
So a new SNMP counter is introduced to track this behavior.

LINUX_MIB_TCPRCVQDROP:  Number of packets meant to be queued in rcv queue
			but dropped because socket rcvbuf limit hit.
Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea5d0c32

29 6月, 2018 9 次提交

net/sched: add tunnel option support to act_tunnel_key · 0ed5269f

由 Simon Horman 提交于 6月 26, 2018

Allow setting tunnel options using the act_tunnel_key action.

Options are expressed as class:type:data and multiple options
may be listed using a comma delimiter.

 # ip link add name geneve0 type geneve dstport 0 external
 # tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
            set src_ip 10.0.99.192 \
            dst_ip 10.0.99.193 \
            dst_port 6081 \
            id 11 \
            geneve_opts 0102:80:00800022,0102:80:00800022 \
    action mirred egress redirect dev geneve0
Signed-off-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ed5269f

net: check tunnel option type in tunnel flags · 256c87c1

由 Pieter Jansen van Vuuren 提交于 6月 26, 2018

Check the tunnel option type stored in tunnel flags when creating options
for tunnels. Thereby ensuring we do not set geneve, vxlan or erspan tunnel
options on interfaces that are not associated with them.

Make sure all users of the infrastructure set correct flags, for the BPF
helper we have to set all bits to keep backward compatibility.
Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

256c87c1

net/sched: act_tunnel_key: add extended ack support · 9d7298cd

由 Simon Horman 提交于 6月 26, 2018

Add extended ack support for the tunnel key action by using NL_SET_ERR_MSG
during validation of user input.

Cc: Alexander Aring <aring@mojatatu.com>
Signed-off-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d7298cd

net/sched: act_tunnel_key: disambiguate metadata dst error cases · a1165b59

由 Simon Horman 提交于 6月 26, 2018

Metadata may be NULL for one of two reasons:
* Missing user input
* Failure to allocate the metadata dst

Disambiguate these case by returning -EINVAL for the former and -ENOMEM
for the latter rather than -EINVAL for both cases.

This is in preparation for using extended ack to provide more information
to users when parsing their input.
Signed-off-by: NSimon Horman <simon.horman@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1165b59

sctp: add support for SCTP_REUSE_PORT sockopt · b0e9a2fe

由 Xin Long 提交于 6月 28, 2018

This feature is actually already supported by sk->sk_reuse which can be
set by socket level opt SO_REUSEADDR. But it's not working exactly as
RFC6458 demands in section 8.1.27, like:

  - This option only supports one-to-one style SCTP sockets
  - This socket option must not be used after calling bind()
    or sctp_bindx().

Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs.
Otherwise, the programs with SCTP_REUSE_PORT from other systems will not
work in linux.

To separate it from the socket level version, this patch adds 'reuse' in
sctp_sock and it works pretty much as sk->sk_reuse, but with some extra
setup limitations that are needed when it is being enabled.

"It should be noted that the behavior of the socket-level socket option
to reuse ports and/or addresses for SCTP sockets is unspecified", so it
leaves SO_REUSEADDR as is for the compatibility.

Note that the name SCTP_REUSE_PORT is somewhat confusing, as its
functionality is nearly identical to SO_REUSEADDR, but with some
extra restrictions. Here it uses 'reuse' in sctp_sock instead of
'reuseport'. As for sk->sk_reuseport support for SCTP, it will be
added in another patch.

Thanks to Neil to make this clear.

v1->v2:
  - add sctp_sk->reuse to separate it from the socket level version.
v2->v3:
  - improve changelog according to Marcelo's suggestion.
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0e9a2fe

ila: Flush netlink command to clear xlat table · b6e71bde

由 Tom Herbert 提交于 6月 27, 2018

Add ILA_CMD_FLUSH netlink command to clear the ILA translation table.
Signed-off-by: NTom Herbert <tom@quantonium.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6e71bde

ila: Create main ila source file · ad68147e

由 Tom Herbert 提交于 6月 27, 2018

Create a main ila file that contains the module initialization functions
as well as netlink definitions. Previously these were defined in
ila_xlat and ila_common. This approach allows better extensibility.
Signed-off-by: NTom Herbert <tom@quantonium.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad68147e

ila: Call library function alloc_bucket_locks · b8932817

由 Tom Herbert 提交于 6月 27, 2018

To allocate the array of bucket locks for the hash table we now
call library function alloc_bucket_spinlocks.
Signed-off-by: NTom Herbert <tom@quantonium.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8932817

ila: Fix use of rhashtable walk in ila_xlat.c · f7a2ba5a

由 Tom Herbert 提交于 6月 27, 2018

Perform better EAGAIN handling, handle case where ila_dump_info
fails and we missed objects in the dump, and add a skip index
to skip over ila entires in a list on a rhashtable node that have
already been visited (by a previous call to ila_nl_dump).
Signed-off-by: NTom Herbert <tom@quantonium.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7a2ba5a

28 6月, 2018 10 次提交

skbuff: preserve sock reference when scrubbing the skb. · 9c4c3252

由 Flavio Leitner 提交于 6月 27, 2018

The sock reference is lost when scrubbing the packet and that breaks
TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
performance impacts of about 50% in a single TCP stream when crossing
network namespaces.

XPS breaks because the queue mapping stored in the socket is not
available, so another random queue might be selected when the stack
needs to transmit something like a TCP ACK, or TCP Retransmissions.
That causes packet re-ordering and/or performance issues.

TSQ breaks because it orphans the packet while it is still in the
host, so packets are queued contributing to the buffer bloat problem.

Preserving the sock reference fixes both issues. The socket is
orphaned anyways in the receiving path before any relevant action
and on TX side the netfilter checks if the reference is local before
use it.
Signed-off-by: NFlavio Leitner <fbl@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c4c3252

netfilter: check if the socket netns is correct. · f5646501

由 Flavio Leitner 提交于 6月 27, 2018

Netfilter assumes that if the socket is present in the skb, then
it can be used because that reference is cleaned up while the skb
is crossing netns.

We want to change that to preserve the socket reference in a future
patch, so this is a preparation updating netfilter to check if the
socket netns matches before use it.
Signed-off-by: NFlavio Leitner <fbl@redhat.com>
Acked-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5646501

net sched actions: avoid bitwise operation on signed value in pedit · 43052741