提交 · a8b659e7ff75a6e766bc5691df57ceb26018db9f · openeuler / Kernel

13 2月, 2021 14 次提交

net: dsa: act as passthrough for bridge port flags · a8b659e7

由 Vladimir Oltean 提交于 2月 12, 2021

There are multiple ways in which a PORT_BRIDGE_FLAGS attribute can be
expressed by the bridge through switchdev, and not all of them can be
emulated by DSA mid-layer API at the same time.

One possible configuration is when the bridge offloads the port flags
using a mask that has a single bit set - therefore only one feature
should change. However, DSA currently groups together unicast and
multicast flooding in the .port_egress_floods method, which limits our
options when we try to add support for turning off broadcast flooding:
do we extend .port_egress_floods with a third parameter which b53 and
mv88e6xxx will ignore? But that means that the DSA layer, which
currently implements the PRE_BRIDGE_FLAGS attribute all by itself, will
see that .port_egress_floods is implemented, and will report that all 3
types of flooding are supported - not necessarily true.

Another configuration is when the user specifies more than one flag at
the same time, in the same netlink message. If we were to create one
individual function per offloadable bridge port flag, we would limit the
expressiveness of the switch driver of refusing certain combinations of
flag values. For example, a switch may not have an explicit knob for
flooding of unknown multicast, just for flooding in general. In that
case, the only correct thing to do is to allow changes to BR_FLOOD and
BR_MCAST_FLOOD in tandem, and never allow mismatched values. But having
a separate .port_set_unicast_flood and .port_set_multicast_flood would
not allow the driver to possibly reject that.

Also, DSA doesn't consider it necessary to inform the driver that a
SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it
just calls .port_egress_floods for the CPU port. When we'll add support
for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real
problem because the flood settings will need to be held statefully in
the DSA middle layer, otherwise changing the mrouter port attribute will
impact the flooding attribute. And that's _assuming_ that the underlying
hardware doesn't have anything else to do when a multicast router
attaches to a port than flood unknown traffic to it. If it does, there
will need to be a dedicated .port_set_mrouter anyway.

So we need to let the DSA drivers see the exact form that the bridge
passes this switchdev attribute in, otherwise we are standing in the
way. Therefore we also need to use this form of language when
communicating to the driver that it needs to configure its initial
(before bridge join) and final (after bridge leave) port flags.

The b53 and mv88e6xxx drivers are converted to the passthrough API and
their implementation of .port_egress_floods is split into two: a
function that configures unicast flooding and another for multicast.
The mv88e6xxx implementation is quite hairy, and it turns out that
the implementations of unknown unicast flooding are actually the same
for 6185 and for 6352:

behind the confusing names actually lie two individual bits:
NO_UNKNOWN_MC -> FLOOD_UC = 0x4 = BIT(2)
NO_UNKNOWN_UC -> FLOOD_MC = 0x8 = BIT(3)

so there was no reason to entangle them in the first place.

Whereas the 6185 writes to MV88E6185_PORT_CTL0_FORWARD_UNKNOWN of
PORT_CTL0, which has the exact same bit index. I have left the
implementations separate though, for the only reason that the names are
different enough to confuse me, since I am not able to double-check with
a user manual. The multicast flooding setting for 6185 is in a different
register than for 6352 though.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8b659e7

net: switchdev: pass flags and mask to both {PRE_,}BRIDGE_FLAGS attributes · e18f4c18

由 Vladimir Oltean 提交于 2月 12, 2021

This switchdev attribute offers a counterproductive API for a driver
writer, because although br_switchdev_set_port_flag gets passed a
"flags" and a "mask", those are passed piecemeal to the driver, so while
the PRE_BRIDGE_FLAGS listener knows what changed because it has the
"mask", the BRIDGE_FLAGS listener doesn't, because it only has the final
value. But certain drivers can offload only certain combinations of
settings, like for example they cannot change unicast flooding
independently of multicast flooding - they must be both on or both off.
The way the information is passed to switchdev makes drivers not
expressive enough, and unable to reject this request ahead of time, in
the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during
the deferred BRIDGE_FLAGS attribute, where the rejection is currently
ignored.

This patch also changes drivers to make use of the "mask" field for edge
detection when possible.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e18f4c18

net: dsa: configure better brport flags when ports leave the bridge · 5e38c158

由 Vladimir Oltean 提交于 2月 12, 2021

For a DSA switch port operating in standalone mode, address learning
doesn't make much sense since that is a bridge function. In fact,
address learning even breaks setups such as this one:

   +---------------------------------------------+
   |                                             |
   | +-------------------+                       |
   | |        br0        |    send      receive  |
   | +--------+-+--------+ +--------+ +--------+ |
   | |        | |        | |        | |        | |
   | |  swp0  | |  swp1  | |  swp2  | |  swp3  | |
   | |        | |        | |        | |        | |
   +-+--------+-+--------+-+--------+-+--------+-+
          |         ^           |          ^
          |         |           |          |
          |         +-----------+          |
          |                                |
          +--------------------------------+

because if the switch has a single FDB (can offload a single bridge)
then source address learning on swp3 can "steal" the source MAC address
of swp2 from br0's FDB, because learning frames coming from swp2 will be
done twice: first on the swp1 ingress port, second on the swp3 ingress
port. So the hardware FDB will become out of sync with the software
bridge, and when swp2 tries to send one more packet towards swp1, the
ASIC will attempt to short-circuit the forwarding path and send it
directly to swp3 (since that's the last port it learned that address on),
which it obviously can't, because swp3 operates in standalone mode.

So DSA drivers operating in standalone mode should still configure a
list of bridge port flags even when they are standalone. Currently DSA
attempts to call dsa_port_bridge_flags with 0, which disables egress
flooding of unknown unicast and multicast, something which doesn't make
much sense. For the switches that implement .port_egress_floods - b53
and mv88e6xxx, it probably doesn't matter too much either, since they
can possibly inject traffic from the CPU into a standalone port,
regardless of MAC DA, even if egress flooding is turned off for that
port, but certainly not all DSA switches can do that - sja1105, for
example, can't. So it makes sense to use a better common default there,
such as "flood everything".

It should also be noted that what DSA calls "dsa_port_bridge_flags()"
is a degenerate name for just calling .port_egress_floods(), since
nothing else is implemented - not learning, in particular. But disabling
address learning, something that this driver is also coding up for, will
be supported by individual drivers once .port_egress_floods is replaced
with a more generic .port_bridge_flags.

Previous attempts to code up this logic have been in the common bridge
layer, but as pointed out by Ido Schimmel, there are corner cases that
are missed when doing that:
https://patchwork.kernel.org/project/netdevbpf/patch/20210209151936.97382-5-olteanv@gmail.com/

So, at least for now, let's leave DSA in charge of setting port flags
before and after the bridge join and leave.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e38c158

net: bridge: don't print in br_switchdev_set_port_flag · 078bbb85

由 Vladimir Oltean 提交于 2月 12, 2021

For the netlink interface, propagate errors through extack rather than
simply printing them to the console. For the sysfs interface, we still
print to the console, but at least that's one layer higher than in
switchdev, which also allows us to silently ignore the offloading of
flags if that is ever needed in the future.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

078bbb85

net: bridge: offload all port flags at once in br_setport · 304ae3bf

由 Vladimir Oltean 提交于 2月 12, 2021

If for example this command:

ip link set swp0 type bridge_slave flood off mcast_flood off learning off

succeeded at configuring BR_FLOOD and BR_MCAST_FLOOD but not at
BR_LEARNING, there would be no attempt to revert the partial state in
any way. Arguably, if the user changes more than one flag through the
same netlink command, this one _should_ be all or nothing, which means
it should be passed through switchdev as all or nothing.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

304ae3bf

net: switchdev: propagate extack to port attributes · 4c08c586

由 Vladimir Oltean 提交于 2月 12, 2021

When a struct switchdev_attr is notified through switchdev, there is no
way to report informational messages, unlike for struct switchdev_obj.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NNikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c08c586

rxrpc: Fix dependency on IPv6 in udp tunnel config · 295f830e

由 Vadim Fedorenko 提交于 2月 12, 2021

As udp_port_cfg struct changes its members with dependency on IPv6
configuration, the code in rxrpc should also check for IPv6.

Fixes: 1a9b86c9 ("rxrpc: use udp tunnel APIs instead of open code in rxrpc_open_socket")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NVadim Fedorenko <vfedorenko@novek.ru>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

295f830e

mptcp: add netlink event support · b911c97c

由 Florian Westphal 提交于 2月 12, 2021

Allow userspace (mptcpd) to subscribe to mptcp genl multicast events.
This implementation reuses the same event API as the mptcp kernel fork
to ease integration of existing tools, e.g. mptcpd.

Supported events include:
1. start and close of an mptcp connection
2. start and close of subflows (joins)
3. announce and withdrawals of addresses
4. subflow priority (backup/non-backup) change.
Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b911c97c

mptcp: avoid lock_fast usage in accept path · 4d54cc32

由 Florian Westphal 提交于 2月 12, 2021

Once event support is added this may need to allocate memory while msk
lock is held with softirqs disabled.

Not using lock_fast also allows to do the allocation with GFP_KERNEL.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4d54cc32

mptcp: pass subflow socket to a few helpers · 6c714f1b

由 Florian Westphal 提交于 2月 12, 2021

Pass the first/initial subflow to the existing functions so they can
pass this on to the notification handler that is added later in the
series.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c714f1b

mptcp: move subflow close loop after sk close check · b263b0d7

由 Florian Westphal 提交于 2月 12, 2021

In case mptcp socket is already dead the entire mptcp socket
will be freed. We can avoid the close check in this case.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b263b0d7

mptcp: schedule worker when subflow is closed · 40947e13

由 Florian Westphal 提交于 2月 12, 2021

When remote side closes a subflow we should schedule the worker to
dispose of the subflow in a timely manner.

Otherwise, SF_CLOSED event won't be generated until the mptcp
socket itself is closing or local side is closing another subflow.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40947e13

mptcp: split __mptcp_close_ssk helper · a141e02e

由 Florian Westphal 提交于 2月 12, 2021

Prepare for subflow close events:

When mptcp connection is torn down its enough to send the mptcp socket
close notification rather than a subflow close event for all of the
subflows followed by the mptcp close event.

This splits the helper: mptcp_close_ssk() will emit the close
notification, __mptcp_close_ssk will not.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a141e02e

mptcp: move pm netlink work into pm_netlink · e9801430

由 Florian Westphal 提交于 2月 12, 2021

Allows to make some functions static and avoids acquire of the pm
spinlock in protocol.c.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e9801430

12 2月, 2021 25 次提交

cfg80211/mac80211: Support disabling HE mode · b6db0f89

由 Ben Greear 提交于 2月 04, 2021

Allow user to disable HE mode, similar to how VHT and HT
can be disabled. Useful for testing.
Signed-off-by: NBen Greear <greearb@candelatech.com>
Link: https://lore.kernel.org/r/20210204144610.25971-1-greearb@candelatech.comSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

b6db0f89

mac80211: add STBC encoding to ieee80211_parse_tx_radiotap · 549fdd34

由 Philipp Borgers 提交于 1月 25, 2021

This patch adds support for STBC encoding to the radiotap tx parse
function. Prior to this change adding the STBC flag to the radiotap
header did not encode frames with STBC.
Signed-off-by: NPhilipp Borgers <borgers@mi.fu-berlin.de>
Link: https://lore.kernel.org/r/20210125150744.83065-1-borgers@mi.fu-berlin.de
[use u8_get_bits/u32_encode_bits instead of manually shifting]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

549fdd34

mac80211: minstrel_ht: remove sample rate switching code for constrained devices · c0eb09aa

由 Felix Fietkau 提交于 1月 27, 2021

This was added to mitigate the effects of too much sampling on devices that
use a static global fallback table instead of configurable multi-rate retry.
Now that the sampling algorithm is improved, this code path no longer performs
any better than the standard probing on affected devices.
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-6-nbd@nbd.nameSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

c0eb09aa

mac80211: minstrel_ht: show sampling rates in debugfs · 4a8d0c99

由 Felix Fietkau 提交于 1月 27, 2021

This makes it easier to see what rates are going to be tested next
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-5-nbd@nbd.nameSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

4a8d0c99

mac80211: minstrel_ht: significantly redesign the rate probing strategy · 80d55154

由 Felix Fietkau 提交于 1月 27, 2021

The biggest flaw in current minstrel_ht is the fact that it needs way too
many probing packets to be able to quickly find the best rate.
Depending on the wifi hardware and operating mode, this can significantly
reduce throughput when not operating at the highest available data rate.

In order to be able to significantly reduce the amount of rate sampling,
we need a much smarter selection of probing rates.

The new approach introduced by this patch maintains a limited set of
available rates to be tested during a statistics window.

They are split into distinct categories:
- MINSTREL_SAMPLE_TYPE_INC - incremental rate upgrade:
Pick the next rate group and find the first rate that is faster than
the current max. throughput rate
- MINSTREL_SAMPLE_TYPE_JUMP - random testing of higher rates:
Pick a random rate from the next group that is faster than the current
max throughput rate. This allows faster adaptation when the link changes
significantly
- MINSTREL_SAMPLE_TYPE_SLOW - test a rate between max_prob, max_tp2 and
max_tp in order to reduce the gap between them

In order to prioritize sampling, every 6 attempts are split into 3x INC,
2x JUMP, 1x SLOW.

Available rates are checked and refilled on every stats window update.

With this approach, we finally get a very small delta in throughput when
comparing setting the optimal data rate as a fixed rate vs normal rate
control operation.
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-4-nbd@nbd.nameSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

80d55154

mac80211: minstrel_ht: reduce the need to sample slower rates · 7aece471

由 Felix Fietkau 提交于 1月 27, 2021

In order to more gracefully be able to fall back to lower rates without too
much throughput fluctuations, initialize all untested rates below tested ones
to the maximum probabilty of higher rates.
Usually this leads to untested lower rates getting initialized with a
probability value of 100%, making them better candidates for fallback without
having to rely on random probing
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-3-nbd@nbd.nameSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

7aece471

mac80211: minstrel_ht: update total packets counter in tx status path · 2012a2f7

由 Felix Fietkau 提交于 1月 27, 2021

Keep the update in one place and prepare for further rework
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-2-nbd@nbd.nameSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

2012a2f7

mac80211: minstrel_ht: use bitfields to encode rate indexes · a42fa256

由 Felix Fietkau 提交于 1月 27, 2021

Get rid of a lot of divisions and modulo operations
Reduces code size and improves performance
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20210127055735.78599-1-nbd@nbd.nameSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

a42fa256

cfg80211: initialize reg_rule in __freq_reg_info() · 9e6d5126

由 Luca Coelho 提交于 2月 04, 2021

Sparse started warning on this function because we can potentially
return an uninitialized value. The reason is that if the caller
passes a min_bw value that is higher then the last value in bws[], we
will not go into the loop and reg_rule will remain initialized. This
cannot happen because the only caller of this function uses either 1
or 20 in min_bw, but the function will be more robust if we
pre-initialize the value.
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20210204154439.6c884ea7281c.I257278d03b0c1ae0aa6631672cfa48f1a95d5996@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

9e6d5126

mac80211: fix potential overflow when multiplying to u32 integers · 6194f7e6

由 Colin Ian King 提交于 2月 05, 2021

The multiplication of the u32 variables tx_time and estimated_retx is
performed using a 32 bit multiplication and the result is stored in
a u64 result. This has a potential u32 overflow issue, so avoid this
by casting tx_time to a u64 to force a 64 bit multiply.

Addresses-Coverity: ("Unintentional integer overflow")
Fixes: 050ac52c ("mac80211: code for on-demand Hybrid Wireless Mesh Protocol")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210205175352.208841-1-colin.king@canonical.comSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

6194f7e6

mac80211: enable QoS support for nl80211 ctrl port · 10cb8e61

由 Markus Theil 提交于 2月 06, 2021

This patch unifies sending control port frames
over nl80211 and AF_PACKET sockets a little more.

Before this patch, EAPOL frames got QoS prioritization
only when using AF_PACKET sockets.

__ieee80211_select_queue only selects a QoS-enabled queue
for control port frames, when the control port protocol
is set correctly on the skb. For the AF_PACKET path this
works, but the nl80211 path used ETH_P_802_3.

Another check for injected frames in wme.c then prevented
the QoS TID to be copied in the frame.

In order to fix this, get rid of the frame injection marking
for nl80211 ctrl port and set the correct ethernet protocol.

Please note:
An erlier version of this path tried to prevent
frame aggregation for control port frames in order to speed up
the initial connection setup a little. This seemed to cause
issues on my older Intel dvm-based hardware, and was therefore
removed again. Future commits which try to reintroduce this
have to check carefully how hw behaves with aggregated and
non-aggregated traffic for the same TID.
My NIC: Intel(R) Centrino(R) Ultimate-N 6300 AGN, REV=0x74
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NMarkus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20210206115112.567881-1-markus.theil@tu-ilmenau.deSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

10cb8e61

cfg80211: remove unused callback · 258afa78

由 Matteo Croce 提交于 2月 08, 2021

The ieee80211 class registers a callback which actually does nothing.
Given that the callback is optional, and all its accesses are protected
by a NULL check, remove it entirely.
Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
Link: https://lore.kernel.org/r/20210208113356.4105-1-mcroce@linux.microsoft.comSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

258afa78

net/tls: Select SOCK_RX_QUEUE_MAPPING from TLS_DEVICE · 76f16593

由 Tariq Toukan 提交于 2月 11, 2021

Compile-in the socket RX queue mapping field and logic when TLS_DEVICE
is enabled. This allows device drivers to pick the recorded socket's
RX queue and use it for streams distribution.
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Reviewed-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76f16593

net/sock: Add kernel config SOCK_RX_QUEUE_MAPPING · 4e1beecc

由 Tariq Toukan 提交于 2月 11, 2021

Use a new config SOCK_RX_QUEUE_MAPPING to compile-in the socket
RX queue field and logic, instead of the XPS config.
This breaks dependency in XPS, and allows selecting it from non-XPS
use cases, as we do in the next patch.

In addition, use the new flag to wrap the logic in sk_rx_queue_get()
and protect access to the sk_rx_queue_mapping field, while keeping
the function exposed unconditionally, just like sk_rx_queue_set()
and sk_rx_queue_clear().
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Reviewed-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e1beecc

tcp: Sanitize CMSG flags and reserved args in tcp_zerocopy_receive. · 3c5a2fd0

由 Arjun Roy 提交于 2月 11, 2021

Explicitly define reserved field and require it and any subsequent
fields to be zero-valued for now. Additionally, limit the valid CMSG
flags that tcp_zerocopy_receive accepts.

Fixes: 7eeba170 ("tcp: Add receive timestamp support for receive zerocopy.")
Signed-off-by: NArjun Roy <arjunroy@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Suggested-by: NDavid Ahern <dsahern@gmail.com>
Suggested-by: NLeon Romanovsky <leon@kernel.org>
Suggested-by: NJakub Kicinski <kuba@kernel.org>
Acked-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c5a2fd0

net: fix dev_ifsioc_locked() race condition · 3b23a32a

由 Cong Wang 提交于 2月 11, 2021

dev_ifsioc_locked() is called with only RCU read lock, so when
there is a parallel writer changing the mac address, it could
get a partially updated mac address, as shown below:

Thread 1			Thread 2
// eth_commit_mac_addr_change()
memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
				// dev_ifsioc_locked()
				memcpy(ifr->ifr_hwaddr.sa_data,
					dev->dev_addr,...);

Close this race condition by guarding them with a RW semaphore,
like netdev_get_name(). We can not use seqlock here as it does not
allow blocking. The writers already take RTNL anyway, so this does
not affect the slow path. To avoid bothering existing
dev_set_mac_address() callers in drivers, introduce a new wrapper
just for user-facing callers on ioctl and rtnetlink paths.

Note, bonding also changes slave mac addresses but that requires
a separate patch due to the complexity of bonding code.

Fixes: 3710becf ("net: RCU locking for simple ioctl()")
Reported-by: N"Gong, Sishuai" <sishuai@purdue.edu>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: NCong Wang <cong.wang@bytedance.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b23a32a

net: fib_notifier: don't return positive values on fib registration · 6f199552

由 Vlad Buslov 提交于 2月 11, 2021

The function fib6_walk_continue() cannot return a positive value when
called from register_fib_notifier(), but ignoring causes static analyzer to
generate warnings in users of register_fib_notifier() that try to convert
returned error code to pointer with ERR_PTR(). Handle such case by
explicitly checking for positive error values and converting them to
-EINVAL in fib6_tables_dump().
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Suggested-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f199552

net: ipconfig: avoid use-after-free in ic_close_devs · f68cbaed

由 Vladimir Oltean 提交于 2月 11, 2021

Due to the fact that ic_dev->dev is kept open in ic_close_dev, I had
thought that ic_dev will not be freed either. But that is not the case,
but instead "everybody dies" when ipconfig cleans up, and just the
net_device behind ic_dev->dev remains allocated but not ic_dev itself.

This is a problem because in ic_close_devs, for every net device that
we're about to close, we compare it against the list of lower interfaces
of ic_dev, to figure out whether we should close it or not. But since
ic_dev itself is subject to freeing, this means that at some point in
the middle of the list of ipconfig interfaces, ic_dev will have been
freed, and we would be still attempting to iterate through its list of
lower interfaces while checking whether to bring down the remaining
ipconfig interfaces.

There are multiple ways to avoid the use-after-free: we could delay
freeing ic_dev until the very end (outside the while loop). Or an even
simpler one: we can observe that we don't need ic_dev when iterating
through its lowers, only ic_dev->dev, structure which isn't ever freed.
So, by keeping ic_dev->dev in a variable assigned prior to freeing
ic_dev, we can avoid all use-after-free issues.

Fixes: 46acf7bd ("Revert "net: ipv4: handle DSA enabled master network devices"")
Reported-by: Nkernel test robot <oliver.sang@intel.com>
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f68cbaed

net: initialize net->net_cookie at netns setup · 3d368ab8

由 Eric Dumazet 提交于 2月 10, 2021

It is simpler to make net->net_cookie a plain u64
written once in setup_net() instead of looping
and using atomic64 helpers.

Lorenz Bauer wants to add SO_NETNS_COOKIE socket option
and this patch would makes his patch series simpler.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Tested-by: NLorenz Bauer <lmb@cloudflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d368ab8

net: dsa: xrs700x: add HSR offloading support · bd62e6f5

由 George McCollister 提交于 2月 09, 2021

Add offloading for HSR/PRP (IEC 62439-3) tag insertion, tag removal
forwarding and duplication supported by the xrs7000 series switches.

Only HSR v1 and PRP v1 are supported by the xrs7000 series switches (HSR
v0 is not).
Signed-off-by: NGeorge McCollister <george.mccollister@gmail.com>
Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd62e6f5

net: dsa: add support for offloading HSR · 18596f50

由 George McCollister 提交于 2月 09, 2021

Add support for offloading of HSR/PRP (IEC 62439-3) tag insertion
tag removal, duplicate generation and forwarding on DSA switches.

Add DSA_NOTIFIER_HSR_JOIN and DSA_NOTIFIER_HSR_LEAVE which trigger calls
to .port_hsr_join and .port_hsr_leave in the DSA driver for the switch.

The DSA switch driver should then set netdev feature flags for the
HSR/PRP operation that it offloads.
    NETIF_F_HW_HSR_TAG_INS
    NETIF_F_HW_HSR_TAG_RM
    NETIF_F_HW_HSR_FWD
    NETIF_F_HW_HSR_DUP
Signed-off-by: NGeorge McCollister <george.mccollister@gmail.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18596f50

net: hsr: add offloading support · dcf0cd1c

由 George McCollister 提交于 2月 09, 2021

Add support for offloading of HSR/PRP (IEC 62439-3) tag insertion
tag removal, duplicate generation and forwarding.

For HSR, insertion involves the switch adding a 6 byte HSR header after
the 14 byte Ethernet header. For PRP it adds a 6 byte trailer.

Tag removal involves automatically stripping the HSR/PRP header/trailer
in the switch. This is possible when the switch also performs auto
deduplication using the HSR/PRP header/trailer (making it no longer
required).

Forwarding involves automatically forwarding between redundant ports in
an HSR. This is crucial because delay is accumulated as a frame passes
through each node in the ring.

Duplication involves the switch automatically sending a single frame
from the CPU port to both redundant ports. This is required because the
inserted HSR/PRP header/trailer must contain the same sequence number
on the frames sent out both redundant ports.

Export is_hsr_master so DSA can tell them apart from other devices in
dsa_slave_changeupper.
Signed-off-by: NGeorge McCollister <george.mccollister@gmail.com>
Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dcf0cd1c

net: hsr: generate supervision frame without HSR/PRP tag · 78be9217

由 George McCollister 提交于 2月 09, 2021

For a switch to offload insertion of HSR/PRP tags, frames must not be
sent to the CPU facing switch port with a tag. Generate supervision frames
(eth type ETH_P_PRP) without HSR v1 (ETH_P_HSR)/PRP tag and rely on
create_tagged_frame which inserts it later. This will allow skipping the
tag insertion for all outgoing frames in the future which is required for
HSR v1/PRP tag insertions to be offloaded.

HSR v0 supervision frames always contain tag information so insertion of
the tag can't be offloaded. IEC 62439-3 Ed.2.0 (HSR v1) specifically
notes that this was changed since v0 to allow offloading.
Signed-off-by: NGeorge McCollister <george.mccollister@gmail.com>
Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
Tested-by: NVladimir Oltean <olteanv@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78be9217

tcp: add some entropy in __inet_hash_connect() · c579bd1b

由 Eric Dumazet 提交于 2月 09, 2021

Even when implementing RFC 6056 3.3.4 (Algorithm 4: Double-Hash
Port Selection Algorithm), a patient attacker could still be able
to collect enough state from an otherwise idle host.

Idea of this patch is to inject some noise, in the
cases __inet_hash_connect() found a candidate in the first
attempt.

This noise should not significantly reduce the collision
avoidance, and should be zero if connection table
is already well used.

Note that this is not implementing RFC 6056 3.3.5
because we think Algorithm 5 could hurt typical
workloads.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: David Dworken <ddworken@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c579bd1b

tcp: change source port randomizarion at connect() time · 190cc824

由 Eric Dumazet 提交于 2月 09, 2021

RFC 6056 (Recommendations for Transport-Protocol Port Randomization)
provides good summary of why source selection needs extra care.

David Dworken reminded us that linux implements Algorithm 3
as described in RFC 6056 3.3.3

Quoting David :
   In the context of the web, this creates an interesting info leak where
   websites can count how many TCP connections a user's computer is
   establishing over time. For example, this allows a website to count
   exactly how many subresources a third party website loaded.
   This also allows:
   - Distinguishing between different users behind a VPN based on
       distinct source port ranges.
   - Tracking users over time across multiple networks.
   - Covert communication channels between different browsers/browser
       profiles running on the same computer
   - Tracking what applications are running on a computer based on
       the pattern of how fast source ports are getting incremented.

Section 3.3.4 describes an enhancement, that reduces
attackers ability to use the basic information currently
stored into the shared 'u32 hint'.

This change also decreases collision rate when
multiple applications need to connect() to
different destinations.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDavid Dworken <ddworken@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

190cc824

11 2月, 2021 1 次提交

rxrpc: Fix missing dependency on NET_UDP_TUNNEL · dc0e6056

由 David Howells 提交于 2月 09, 2021

The changes to make rxrpc create the udp socket missed a bit to add the
Kconfig dependency on the udp tunnel code to do this.

Fix this by adding making AF_RXRPC select NET_UDP_TUNNEL.

Fixes: 1a9b86c9 ("rxrpc: use udp tunnel APIs instead of open code in rxrpc_open_socket")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NVadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NXin Long <lucien.xin@gmail.com>
cc: alaa@dev.mellanox.co.il
cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc0e6056

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功