提交 · 71ad8d55f8e5ea101069b552422f392655e2ffb6 · openeuler / Kernel

10 7月, 2020 3 次提交

devlink: Replace devlink_port_attrs_set parameters with a struct · 71ad8d55

由 Danielle Ratson 提交于 7月 09, 2020

Currently, devlink_port_attrs_set accepts a long list of parameters,
that most of them are devlink port's attributes.

Use the devlink_port_attrs struct to replace the relevant parameters.
Signed-off-by: NDanielle Ratson <danieller@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71ad8d55

devlink: Move switch_port attribute of devlink_port_attrs to devlink_port · 46737a19

由 Danielle Ratson 提交于 7月 09, 2020

The struct devlink_port_attrs holds the attributes of devlink_port.

Similarly to the previous patch, 'switch_port' attribute is another
exception.

Move 'switch_port' to be devlink_port's field.
Signed-off-by: NDanielle Ratson <danieller@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46737a19

devlink: Move set attribute of devlink_port_attrs to devlink_port · 10a429ba

由 Danielle Ratson 提交于 7月 09, 2020

The struct devlink_port_attrs holds the attributes of devlink_port.

The 'set' field is not devlink_port's attribute as opposed to most of the
others.

Move 'set' to be devlink_port's field called 'attrs_set'.
Signed-off-by: NDanielle Ratson <danieller@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10a429ba

09 7月, 2020 1 次提交

net: dsa: tag_rtl4_a: Implement Realtek 4 byte A tag · efd7fe68

由 Linus Walleij 提交于 7月 08, 2020

This implements the known parts of the Realtek 4 byte
tag protocol version 0xA, as found in the RTL8366RB
DSA switch.

It is designated as protocol version 0xA as a
different Realtek 4 byte tag format with protocol
version 0x9 is known to exist in the Realtek RTL8306
chips.

The tag and switch chip lacks public documentation, so
the tag format has been reverse-engineered from
packet dumps. As only ingress traffic has been available
for analysis an egress tag has not been possible to
develop (even using educated guesses about bit fields)
so this is as far as it gets. It is not known if the
switch even supports egress tagging.

Excessive attempts to figure out the egress tag format
was made. When nothing else worked, I just tried all bit
combinations with 0xannp where a is protocol and p is
port. I looped through all values several times trying
to get a response from ping, without any positive
result.

Using just these ingress tags however, the switch
functionality is vastly improved and the packets find
their way into the destination port without any
tricky VLAN configuration. On the D-Link DIR-685 the
LAN ports now come up and respond to ping without
any command line configuration so this is a real
improvement for users.

Egress packets need to be restricted to the proper
target ports using VLAN, which the RTL8366RB DSA
switch driver already sets up.

Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Mauri Sandberg <sandberg@mailfence.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

efd7fe68

04 7月, 2020 4 次提交

netfilter: nf_tables: add NFT_CHAIN_BINDING · d0e2c7de

由 Pablo Neira Ayuso 提交于 6月 30, 2020

This new chain flag specifies that:

* the kernel dynamically allocates the chain name, if no chain name
  is specified.

* If the immediate expression that refers to this chain is removed,
  then this bound chain (and its content) is destroyed.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d0e2c7de

netfilter: nf_tables: expose enum nft_chain_flags through UAPI · 67c49de4

由 Pablo Neira Ayuso 提交于 6月 30, 2020

This enum definition was never exposed through UAPI. Rename
NFT_BASE_CHAIN to NFT_CHAIN_BASE for consistency.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

67c49de4

netfilter: nf_tables: add NFTA_CHAIN_ID attribute · 74cccc3d

由 Pablo Neira Ayuso 提交于 6月 30, 2020

This netlink attribute allows you to refer to chains inside a
transaction as an alternative to the name and the handle. The chain
binding support requires this new chain ID approach.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

74cccc3d

ipvs: allow connection reuse for unconfirmed conntrack · f0a5e4d7

由 Julian Anastasov 提交于 7月 01, 2020

YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.

As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.

YangYuxi lists some of the problem reports:

- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2

- IPVS low throughput #70747
https://github.com/kubernetes/kubernetes/issues/70747

- Apache Bench can fill up ipvs service proxy in seconds #544
https://github.com/cloudnativelabs/kube-router/issues/544

- Additional 1s latency in `host -> service IP -> pod`
https://github.com/kubernetes/kubernetes/issues/90854

Fixes: f719e375 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: NYangYuxi <yx.atom1@gmail.com>
Signed-off-by: NYangYuxi <yx.atom1@gmail.com>
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Reviewed-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f0a5e4d7

02 7月, 2020 1 次提交

bonding: allow xfrm offload setup post-module-load · a3b658cf

由 Jarod Wilson 提交于 6月 30, 2020

At the moment, bonding xfrm crypto offload can only be set up if the bonding
module is loaded with active-backup mode already set. We need to be able to
make this work with bonds set to AB after the bonding driver has already
been loaded.

So what's done here is:

1) move #define BOND_XFRM_FEATURES to net/bonding.h so it can be used
by both bond_main.c and bond_options.c
2) set BOND_XFRM_FEATURES in bond_dev->hw_features universally, rather than
only when loading in AB mode
3) wire up xfrmdev_ops universally too
4) disable BOND_XFRM_FEATURES in bond_dev->features if not AB
5) exit early (non-AB case) from bond_ipsec_offload_ok, to prevent a
performance hit from traversing into the underlying drivers
5) toggle BOND_XFRM_FEATURES in bond_dev->wanted_features and call
netdev_change_features() from bond_option_mode_set()

In my local testing, I can change bonding modes back and forth on the fly,
have hardware offload work when I'm in AB, and see no performance penalty
to non-AB software encryption, despite having xfrm bits all wired up for
all modes now.

Fixes: 18cb261a ("bonding: support hardware encryption offload to slaves")
Reported-by: NHuy Nguyen <huyn@mellanox.com>
CC: Saeed Mahameed <saeedm@mellanox.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: netdev@vger.kernel.org
CC: intel-wired-lan@lists.osuosl.org
Signed-off-by: NJarod Wilson <jarod@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3b658cf

01 7月, 2020 2 次提交

net/tls: fix sign extension issue when left shifting u16 value · a6ed3ebc

由 Colin Ian King 提交于 6月 30, 2020

Left shifting the u16 value promotes it to a int and then it
gets sign extended to a u64.  If len << 16 is greater than 0x7fffffff
then the upper bits get set to 1 because of the implicit sign extension.
Fix this by casting len to u64 before shifting it.

Addresses-Coverity: ("integer handling issues")
Fixes: ed9b7646 ("net/tls: Add asynchronous resync")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6ed3ebc

ipvs: register hooks only with services · 857ca897

由 Julian Anastasov 提交于 6月 21, 2020

Keep the IPVS hooks registered in Netfilter only
while there are configured virtual services. This
saves CPU cycles while IPVS is loaded but not used.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Reviewed-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

857ca897

30 6月, 2020 4 次提交

net:qos: police action offloading parameter 'burst' change to the original value · 5f035af7

由 Po Liu 提交于 6月 29, 2020

Since 'tcfp_burst' with TICK factor, driver side always need to recover
it to the original value, this patch moves the generic calculation and
recover to the 'burst' original value before offloading to device driver.
Signed-off-by: NPo Liu <po.liu@nxp.com>
Acked-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f035af7

net: sched: sch_red: Add qevents "early_drop" and "mark" · aee9caa0

由 Petr Machata 提交于 6月 27, 2020

In order to allow acting on dropped and/or ECN-marked packets, add two new
qevents to the RED qdisc: "early_drop" and "mark". Filters attached at
"early_drop" block are executed as packets are early-dropped, those
attached at the "mark" block are executed as packets are ECN-marked.

Two new attributes are introduced: TCA_RED_EARLY_DROP_BLOCK with the block
index for the "early_drop" qevent, and TCA_RED_MARK_BLOCK for the "mark"
qevent. Absence of these attributes signifies "don't care": no block is
allocated in that case, or the existing blocks are left intact in case of
the change callback.

For purposes of offloading, blocks attached to these qevents appear with
newly-introduced binder types, FLOW_BLOCK_BINDER_TYPE_RED_EARLY_DROP and
FLOW_BLOCK_BINDER_TYPE_RED_MARK.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aee9caa0

net: sched: Introduce helpers for qevent blocks · 3625750f

由 Petr Machata 提交于 6月 27, 2020

Qevents are attach points for TC blocks, where filters can be put that are
executed when "interesting events" take place in a qdisc. The data to keep
and the functions to invoke to maintain a qevent will be largely the same
between qevents. Therefore introduce sched-wide helpers for qevent
management.

Currently, similarly to ingress and egress blocks of clsact pseudo-qdisc,
blocks attachment cannot be changed after the qdisc is created. To that
end, add a helper tcf_qevent_validate_change(), which verifies whether
block index attribute is not attached, or if it is, whether its value
matches the current one (i.e. there is no material change).

The function tcf_qevent_handle() should be invoked when qdisc hits the
"interesting event" corresponding to a block. This function releases root
lock for the duration of executing the attached filters, to allow packets
generated through user actions (notably mirred) to be reinserted to the
same qdisc tree.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3625750f

net: sched: Pass root lock to Qdisc_ops.enqueue · aebe4426

由 Petr Machata 提交于 6月 27, 2020

A following patch introduces qevents, points in qdisc algorithm where
packet can be processed by user-defined filters. Should this processing
lead to a situation where a new packet is to be enqueued on the same port,
holding the root lock would lead to deadlocks. To solve the issue, qevent
handler needs to unlock and relock the root lock when necessary.

To that end, add the root lock argument to the qdisc op enqueue, and
propagate throughout.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aebe4426

29 6月, 2020 2 次提交

sctp: use list_is_singular in sctp_list_single_entry · 6fc3e68f

由 Geliang Tang 提交于 6月 28, 2020

Use list_is_singular() instead of open-coding.
Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6fc3e68f

bareudp: Added attribute to enable & disable rx metadata collection · fe80536a

由 Martin 提交于 6月 28, 2020

Metadata need not be collected in receive if the packet from bareudp
device is not targeted to openvswitch.
Signed-off-by: NMartin <martin.varghese@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe80536a

28 6月, 2020 2 次提交

net/tls: Add asynchronous resync · ed9b7646

由 Boris Pismenny 提交于 6月 08, 2020

This patch adds support for asynchronous resynchronization in tls_device.
Async resync follows two distinct stages:

1. The NIC driver indicates that it would like to resync on some TLS
record within the received packet (P), but the driver does not
know (yet) which of the TLS records within the packet.
At this stage, the NIC driver will query the device to find the exact
TCP sequence for resync (tcpsn), however, the driver does not wait
for the device to provide the response.

2. Eventually, the device responds, and the driver provides the tcpsn
within the resync packet to KTLS. Now, KTLS can check the tcpsn against
any processed TLS records within packet P, and also against any record
that is processed in the future within packet P.

The asynchronous resync path simplifies the device driver, as it can
save bits on the packet completion (32-bit TCP sequence), and pass this
information on an asynchronous command instead.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

ed9b7646

Revert "net/tls: Add force_resync for driver resync" · acb5a07a

由 Boris Pismenny 提交于 6月 08, 2020

This reverts commit b3ae2459.
Revert the force resync API.
Not in use. To be replaced by a better async resync API downstream.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reviewed-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

acb5a07a

26 6月, 2020 1 次提交

sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket · 471e39df

由 Marcelo Ricardo Leitner 提交于 6月 24, 2020

If a socket is set ipv6only, it will still send IPv4 addresses in the
INIT and INIT_ACK packets. This potentially misleads the peer into using
them, which then would cause association termination.

The fix is to not add IPv4 addresses to ipv6only sockets.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Reported-by: NCorey Minyard <cminyard@mvista.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: NCorey Minyard <cminyard@mvista.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

471e39df

25 6月, 2020 5 次提交

net: qos: police action add index for tc flower offloading · 627e39b1

由 Po Liu 提交于 6月 24, 2020

Hardware device may include more than one police entry. Specifying the
action's index make it possible for several tc filters to share the same
police action when installing the filters.

Propagate this index to device drivers through the flow offload
intermediate representation, so that drivers could share a single
hardware policer between multiple filters.

v1->v2 changes:
- Update the commit message suggest by Ido Schimmel <idosch@idosch.org>
Signed-off-by: NPo Liu <Po.Liu@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

627e39b1

net: qos: add tc police offloading action with max frame size limit · 19e528dc

由 Po Liu 提交于 6月 24, 2020

Current police offloading support the 'burst'' and 'rate_bytes_ps'. Some
hardware own the capability to limit the frame size. If the frame size
larger than the setting, the frame would be dropped. For the police
action itself already accept the 'mtu' parameter in tc command. But not
extend to tc flower offloading. So extend 'mtu' to tc flower offloading.
Signed-off-by: NPo Liu <Po.Liu@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19e528dc

net: bpf: Add bpf_seq_afinfo in udp_iter_state · 9e8ca27a

由 Yonghong Song 提交于 6月 23, 2020

Similar to tcp_iter_state, a new field bpf_seq_afinfo is
added to udp_iter_state to provide bpf udp iterator
afinfo.

This does not change /proc/net/{udp, udp6} behavior. But
it enables bpf iterator to avoid get afinfo from PDE_DATA
and iterate through all udp and udp6 sockets in one pass.
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230812.3988347-1-yhs@fb.com

9e8ca27a

net: bpf: Add bpf_seq_afinfo in tcp_iter_state · b08d4d3b

由 Yonghong Song 提交于 6月 23, 2020

A new field bpf_seq_afinfo is added to tcp_iter_state
to provide bpf tcp iterator afinfo. There are two
reasons on why we did this.

First, the current way to get afinfo from PDE_DATA
does not work for bpf iterator as its seq_file
inode does not conform to /proc/net/{tcp,tcp6}
inode structures. More specifically, anonymous
bpf iterator will use an anonymous inode which
is shared in the system and we cannot change inode
private data structure at all.

Second, bpf iterator for tcp/tcp6 wants to
traverse all tcp and tcp6 sockets in one pass
and bpf program can control whether they want
to skip one sk_family or not. Having a different
afinfo with family AF_UNSPEC make it easier
to understand in the code.

This patch does not change /proc/net/{tcp,tcp6} behavior
as the bpf_seq_afinfo will be NULL for these two proc files.
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230804.3987829-1-yhs@fb.com

b08d4d3b

sock: Move sock_valbool_flag to header · dfde1d7d

由 Dmitry Yakunin 提交于 6月 20, 2020

This is preparation for usage in bpf_setsockopt.
Signed-off-by: NDmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200620153052.9439-1-zeil@yandex-team.ru

dfde1d7d

24 6月, 2020 9 次提交

net: Do not clear the sock TX queue in sk_set_socket() · 41b14fb8

由 Tariq Toukan 提交于 6月 22, 2020

Clearing the sock TX queue in sk_set_socket() might cause unexpected
out-of-order transmit when called from sock_orphan(), as outstanding
packets can pick a different TX queue and bypass the ones already queued.

This is undesired in general. More specifically, it breaks the in-order
scheduling property guarantee for device-offloaded TLS sockets.

Remove the call to sk_tx_queue_clear() in sk_set_socket(), and add it
explicitly only where needed.

Fixes: e022f0b4 ("net: Introduce sk_tx_queue_mapping")
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reviewed-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41b14fb8

net: ipv6: Use struct_size() helper and kcalloc() · 6f393457

由 Gustavo A. R. Silva 提交于 6月 22, 2020

Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes. Also, remove unnecessary
function ipv6_rpl_srh_alloc_size() and replace kzalloc() with kcalloc(),
which has a 2-factor argument form for multiplication.

This code was detected with the help of Coccinelle and, audited and
fixed manually.
Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f393457

udp: move gro declarations to net/udp.h · 6db69328