提交 · 1b1c7a0ef7f323f37281b134ade17baa94779787 · openeuler / Kernel

30 3月, 2020 4 次提交

mptcp: Add path manager interface · 1b1c7a0e

由 Peter Krystad 提交于 3月 27, 2020

Add enough of a path manager interface to allow sending of ADD_ADDR
when an incoming MPTCP connection is created. Capable of sending only
a single IPv4 ADD_ADDR option. The 'pm_data' element of the connection
sock will need to be expanded to handle multiple interfaces and IPv6.
Partial processing of the incoming ADD_ADDR is included so the path
manager notification of that event happens at the proper time, which
involves validating the incoming address information.

This is a skeleton interface definition for events generated by
MPTCP.
Co-developed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Co-developed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NPeter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b1c7a0e

mptcp: Add ADD_ADDR handling · 3df523ab

由 Peter Krystad 提交于 3月 27, 2020

Add handling for sending and receiving the ADD_ADDR, ADD_ADDR6,
and RM_ADDR suboptions.
Co-developed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NPeter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3df523ab

net: Fix typo of SKB_SGO_CB_OFFSET · a08e7fd9

由 Cambda Zhu 提交于 3月 26, 2020

The SKB_SGO_CB_OFFSET should be SKB_GSO_CB_OFFSET which means the
offset of the GSO in skb cb. This patch fixes the typo.

Fixes: 9207f9d4 ("net: preserve IP control block during GSO segmentation")
Signed-off-by: NCambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a08e7fd9

net: page pool: allow to pass zero flags to page_pool_init() · 798dda81

由 Denis Kirjanov 提交于 3月 25, 2020

page pool API can be useful for non-DMA cases like
xen-netfront driver so let's allow to pass zero flags to
page pool flags.

v2: check DMA direction only if PP_FLAG_DMA_MAP is set
Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

798dda81

28 3月, 2020 2 次提交

net: dsa: implement auto-normalization of MTU for bridge hardware datapath · bff33f7e

由 Vladimir Oltean 提交于 3月 27, 2020

Many switches don't have an explicit knob for configuring the MTU
(maximum transmission unit per interface).  Instead, they do the
length-based packet admission checks on the ingress interface, for
reasons that are easy to understand (why would you accept a packet in
the queuing subsystem if you know you're going to drop it anyway).

So it is actually the MRU that these switches permit configuring.

In Linux there only exists the IFLA_MTU netlink attribute and the
associated dev_set_mtu function. The comments like to play blind and say
that it's changing the "maximum transfer unit", which is to say that
there isn't any directionality in the meaning of the MTU word. So that
is the interpretation that this patch is giving to things: MTU == MRU.

When 2 interfaces having different MTUs are bridged, the bridge driver
MTU auto-adjustment logic kicks in: what br_mtu_auto_adjust() does is it
adjusts the MTU of the bridge net device itself (and not that of the
slave net devices) to the minimum value of all slave interfaces, in
order for forwarded packets to not exceed the MTU regardless of the
interface they are received and send on.

The idea behind this behavior, and why the slave MTUs are not adjusted,
is that normal termination from Linux over the L2 forwarding domain
should happen over the bridge net device, which _is_ properly limited by
the minimum MTU. And termination over individual slave devices is
possible even if those are bridged. But that is not "forwarding", so
there's no reason to do normalization there, since only a single
interface sees that packet.

The problem with those switches that can only control the MRU is with
the offloaded data path, where a packet received on an interface with
MRU 9000 would still be forwarded to an interface with MRU 1500. And the
br_mtu_auto_adjust() function does not really help, since the MTU
configured on the bridge net device is ignored.

In order to enforce the de-facto MTU == MRU rule for these switches, we
need to do MTU normalization, which means: in order for no packet larger
than the MTU configured on this port to be sent, then we need to limit
the MRU on all ports that this packet could possibly come from. AKA
since we are configuring the MRU via MTU, it means that all ports within
a bridge forwarding domain should have the same MTU.

And that is exactly what this patch is trying to do.

>From an implementation perspective, we try to follow the intent of the
user, otherwise there is a risk that we might livelock them (they try to
change the MTU on an already-bridged interface, but we just keep
changing it back in an attempt to keep the MTU normalized). So the MTU
that the bridge is normalized to is either:

 - The most recently changed one:

   ip link set dev swp0 master br0
   ip link set dev swp1 master br0
   ip link set dev swp0 mtu 1400

   This sequence will make swp1 inherit MTU 1400 from swp0.

 - The one of the most recently added interface to the bridge:

   ip link set dev swp0 master br0
   ip link set dev swp1 mtu 1400
   ip link set dev swp1 master br0

   The above sequence will make swp0 inherit MTU 1400 as well.
Suggested-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bff33f7e

net: dsa: configure the MTU for switch ports · bfcb8132

由 Vladimir Oltean 提交于 3月 27, 2020

It is useful be able to configure port policers on a switch to accept
frames of various sizes:

- Increase the MTU for better throughput from the default of 1500 if it
  is known that there is no 10/100 Mbps device in the network.
- Decrease the MTU to limit the latency of high-priority frames under
  congestion, or work around various network segments that add extra
  headers to packets which can't be fragmented.

For DSA slave ports, this is mostly a pass-through callback, called
through the regular ndo ops and at probe time (to ensure consistency
across all supported switches).

The CPU port is called with an MTU equal to the largest configured MTU
of the slave ports. The assumption is that the user might want to
sustain a bidirectional conversation with a partner over any switch
port.

The DSA master is configured the same as the CPU port, plus the tagger
overhead. Since the MTU is by definition L2 payload (sans Ethernet
header), it is up to each individual driver to figure out if it needs to
do anything special for its frame tags on the CPU port (it shouldn't
except in special cases). So the MTU does not contain the tagger
overhead on the CPU port.
However the MTU of the DSA master, minus the tagger overhead, is used as
a proxy for the MTU of the CPU port, which does not have a net device.
This is to avoid uselessly calling the .change_mtu function on the CPU
port when nothing should change.

So it is safe to assume that the DSA master and the CPU port MTUs are
apart by exactly the tagger's overhead in bytes.

Some changes were made around dsa_master_set_mtu(), function which was
now removed, for 2 reasons:
  - dev_set_mtu() already calls dev_validate_mtu(), so it's redundant to
    do the same thing in DSA
  - __dev_set_mtu() returns 0 if ops->ndo_change_mtu is an absent method
That is to say, there's no need for this function in DSA, we can safely
call dev_set_mtu() directly, take the rtnl lock when necessary, and just
propagate whatever errors get reported (since the user probably wants to
be informed).

Some inspiration (mainly in the MTU DSA notifier) was taken from a
vaguely similar patch from Murali and Florian, who are credited as
co-developers down below.
Co-developed-by: NMurali Krishna Policharla <murali.policharla@broadcom.com>
Signed-off-by: NMurali Krishna Policharla <murali.policharla@broadcom.com>
Co-developed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfcb8132

27 3月, 2020 17 次提交

net: introduce the MACSEC netdev feature · 5908220b

由 Antoine Tenart 提交于 3月 25, 2020

This patch introduce a new netdev feature, which will be used by drivers
to state they can perform MACsec transformations in hardware.

The patchset was gathered by Mark, macsec functinality itself
was implemented by Dmitry, Mark and Pavel Belous.
Signed-off-by: NAntoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: NMark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: NIgor Russkikh <irusskikh@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5908220b

ipv6: ndisc: add support for 'PREF64' dns64 prefix identifier · c24a77ed

由 Maciej Żenczykowski 提交于 3月 23, 2020

This is trivial since we already have support for the entirely
identical (from the kernel's point of view) RDNSS, DNSSL, etc. that
also contain opaque data that needs to be passed down to userspace
for further processing.

As specified in draft-ietf-6man-ra-pref64-09 (while it is still a draft,
it is purely waiting on the RFC Editor for cleanups and publishing):
  PREF64 option contains lifetime and a (up to) 96-bit IPv6 prefix.

The 8-bit identifier of the option type as assigned by the IANA is 38.

Since we lack DNS64/NAT64/CLAT support in kernel at the moment,
thus this option should also be passed on to userland.

See:
  https://tools.ietf.org/html/draft-ietf-6man-ra-pref64-09
  https://www.iana.org/assignments/icmpv6-parameters/icmpv6-parameters.xhtml#icmpv6-parameters-5

Cc: Erik Kline <ek@google.com>
Cc: Jen Linkova <furry@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Michael Haro <mharo@google.com>
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Acked-By: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c24a77ed

cls_flower: Add extack support for flags key · e304e21a

由 Guillaume Nault 提交于 3月 23, 2020

Pass extack down to fl_set_key_flags() and set message on error.
Signed-off-by: NGuillaume Nault <gnault@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e304e21a

cls_flower: Add extack support for src and dst port range options · bd7d4c12

由 Guillaume Nault 提交于 3月 23, 2020

Pass extack down to fl_set_key_port_range() and set message on error.

Both the min and max ports would qualify as invalid attributes here.
Report the min one as invalid, as it's probably what makes the most
sense from a user point of view.
Signed-off-by: NGuillaume Nault <gnault@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd7d4c12

cls_flower: Add extack support for mpls options · 442f730e

由 Guillaume Nault 提交于 3月 23, 2020

Pass extack down to fl_set_key_mpls() and set message on error.
Signed-off-by: NGuillaume Nault <gnault@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

442f730e

devlink: implement DEVLINK_CMD_REGION_NEW · b9a17abf

由 Jacob Keller 提交于 3月 26, 2020

Implement support for the DEVLINK_CMD_REGION_NEW command for creating
snapshots. This new command parallels the existing
DEVLINK_CMD_REGION_DEL.

In order for DEVLINK_CMD_REGION_NEW to work for a region, the new
".snapshot" operation must be implemented in the region's ops structure.

The desired snapshot id must be provided. This helps avoid confusion on
the purpose of DEVLINK_CMD_REGION_NEW, and keeps the API simpler.

The requested id will be inserted into the xarray tracking the number of
snapshots using each id. If this id is already used by another snapshot
on any region, an error will be returned.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9a17abf

devlink: track snapshot id usage count using an xarray · 12102436

由 Jacob Keller 提交于 3月 26, 2020

Each snapshot created for a devlink region must have an id. These ids
are supposed to be unique per "event" that caused the snapshot to be
created. Drivers call devlink_region_snapshot_id_get to obtain a new id
to use for a new event trigger. The id values are tracked per devlink,
so that the same id number can be used if a triggering event creates
multiple snapshots on different regions.

There is no mechanism for snapshot ids to ever be reused. Introduce an
xarray to store the count of how many snapshots are using a given id,
replacing the snapshot_id field previously used for picking the next id.

The devlink_region_snapshot_id_get() function will use xa_alloc to
insert an initial value of 1 value at an available slot between 0 and
U32_MAX.

The new __devlink_snapshot_id_increment() and
__devlink_snapshot_id_decrement() functions will be used to track how
many snapshots currently use an id.

Drivers must now call devlink_snapshot_id_put() in order to release
their reference of the snapshot id after adding region snapshots.

By tracking the total number of snapshots using a given id, it is
possible for the decrement() function to erase the id from the xarray
when it is not in use.

With this method, a snapshot id can become reused again once all
snapshots that referred to it have been deleted via
DEVLINK_CMD_REGION_DEL, and the driver has finished adding snapshots.

This work also paves the way to introduce a mechanism for userspace to
request a snapshot.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12102436

devlink: report error once U32_MAX snapshot ids have been used · 7ef19d3b

由 Jacob Keller 提交于 3月 26, 2020

The devlink_snapshot_id_get() function returns a snapshot id. The
snapshot id is a u32, so there is no way to indicate an error code.

A future change is going to possibly add additional cases where this
function could fail. Refactor the function to return the snapshot id in
an argument, so that it can return zero or an error value.

This ensures that snapshot ids cannot be confused with error values, and
aids in the future refactor of snapshot id allocation management.

Because there is no current way to release previously used snapshot ids,
add a simple check ensuring that an error is reported in case the
snapshot_id would over flow.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ef19d3b

devlink: extract snapshot id allocation to helper function · 7000108f

由 Jacob Keller 提交于 3月 26, 2020

A future change is going to implement a new devlink command to request
a snapshot on demand. As part of this, the logic for handling the
snapshot ids will be refactored. To simplify the snapshot id allocation
function, move it to a separate function prefixed by `__`. This helper
function will assume the lock is held.

While no other callers will exist, it simplifies refactoring the logic
because there is no need to complicate the function with gotos to handle
unlocking on failure.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7000108f

devlink: use -ENOSPC to indicate no more room for snapshots · 47a39f61

由 Jacob Keller 提交于 3月 26, 2020

The devlink_region_snapshot_create function returns -ENOMEM when the
maximum number of snapshots has been reached. This is confusing because
it is not an issue of being out of memory. Change this to use -ENOSPC
instead.
Reported-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47a39f61

devlink: add function to take snapshot while locked · cf80faee

由 Jacob Keller 提交于 3月 26, 2020

A future change is going to add a new devlink command to request
a snapshot on demand. This function will want to call the
devlink_region_snapshot_create function while already holding the
devlink instance lock.

Extract the logic of this function into a static function prefixed by
`__` to indicate that it is an internal helper function. Modify the
original function to be implemented in terms of the new locked
function.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf80faee

devlink: trivial: fix tab in function documentation · 6d82f67e

由 Jacob Keller 提交于 3月 26, 2020

The function documentation comment for devlink_region_snapshot_create
included a literal tab character between 'future analyses' that was
difficult to spot as it happened to only display as one space wide.

Fix the comment to use a space here instead of a stray tab appearing in
the middle of a sentence.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d82f67e

devlink: convert snapshot destructor callback to region op · a0a09f6b

由 Jacob Keller 提交于 3月 26, 2020

It does not makes sense that two snapshots for a given region would use
different destructors. Simplify snapshot creation by adding
a .destructor op for regions.

This operation will replace the data_destructor for the snapshot
creation, and makes snapshot creation easier.
Noticed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0a09f6b

devlink: prepare to support region operations · e8937681

由 Jacob Keller 提交于 3月 26, 2020

Modify the devlink region code in preparation for adding new operations
on regions.

Create a devlink_region_ops structure, and move the name pointer from
within the devlink_region structure into the ops structure (similar to
the devlink_health_reporter_ops).

This prepares the regions to enable support of additional operations in
the future such as requesting snapshots, or accessing the region
directly without a snapshot.

In order to re-use the constant strings in the mlx4 driver their
declaration must be changed to 'const char * const' to ensure the
compiler realizes that both the data and the pointer cannot change.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e8937681

sched: act_pedit: Implement stats_update callback · d4d9d9c5

由 Petr Machata 提交于 3月 26, 2020

Implement this callback in order to get the offloaded stats added to the
kernel stats.
Reported-by: NAlexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4d9d9c5

sched: act_skbedit: Implement stats_update callback · 837cb17d

由 Petr Machata 提交于 3月 26, 2020

Implement this callback in order to get the offloaded stats added to the
kernel stats.
Reported-by: NAlexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

837cb17d

tipc: Add a missing case of TIPC_DIRECT_MSG type · 8b1e5b0a

由 Hoang Le 提交于 3月 26, 2020

In the commit f73b1281
("tipc: improve throughput between nodes in netns"), we're missing a check
to handle TIPC_DIRECT_MSG type, it's still using old sending mechanism for
this message type. So, throughput improvement is not significant as
expected.

Besides that, when sending a large message with that type, we're also
handle wrong receiving queue, it should be enqueued in socket receiving
instead of multicast messages.

Fix this by adding the missing case for TIPC_DIRECT_MSG.

Fixes: f73b1281 ("tipc: improve throughput between nodes in netns")
Reported-by: NTuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: NHoang Le <hoang.h.le@dektech.com.au>
Acked-by: NJon Maloy <jmaloy@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b1e5b0a

26 3月, 2020 8 次提交

mac80211: set IEEE80211_TX_CTRL_PORT_CTRL_PROTO for nl80211 TX · b95d2ccd

由 Johannes Berg 提交于 3月 26, 2020

When a frame is transmitted via the nl80211 TX rather than as a
normal frame, IEEE80211_TX_CTRL_PORT_CTRL_PROTO wasn't set and
this will lead to wrong decisions (rate control etc.) being made
about the frame; fix this.

Fixes: 91180649 ("mac80211: Add support for tx_control_port")
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20200326155333.f183f52b02f0.I4054e2a8c11c2ddcb795a0103c87be3538690243@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

b95d2ccd

mac80211: mark station unauthorized before key removal · b16798f5

由 Johannes Berg 提交于 3月 26, 2020

If a station is still marked as authorized, mark it as no longer
so before removing its keys. This allows frames transmitted to it
to be rejected, providing additional protection against leaking
plain text data during the disconnection flow.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200326155133.ccb4fb0bb356.If48f0f0504efdcf16b8921f48c6d3bb2cb763c99@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

b16798f5

mac80211: Check port authorization in the ieee80211_tx_dequeue() case · ce2e1ca7

由 Jouni Malinen 提交于 3月 26, 2020

mac80211 used to check port authorization in the Data frame enqueue case
when going through start_xmit(). However, that authorization status may
change while the frame is waiting in a queue. Add a similar check in the
dequeue case to avoid sending previously accepted frames after
authorization change. This provides additional protection against
potential leaking of frames after a station has been disconnected and
the keys for it are being removed.

Cc: stable@vger.kernel.org
Signed-off-by: NJouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200326155133.ced84317ea29.I34d4c47cd8cc8a4042b38a76f16a601fbcbfd9b3@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

ce2e1ca7

cfg80211: Do not warn on same channel at the end of CSA · 05dcb8bb

由 Ilan Peer 提交于 3月 26, 2020

When cfg80211_update_assoc_bss_entry() is called, there is a
verification that the BSS channel actually changed. As some APs use
CSA also for bandwidth changes, this would result with a kernel
warning.

Fix this by removing the WARN_ON().
Signed-off-by: NIlan Peer <ilan.peer@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20200326150855.96316ada0e8d.I6710376b1b4257e5f4712fc7ab16e2b638d512aa@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

05dcb8bb

mac80211: drop data frames without key on encrypted links · a0761a30

由 Johannes Berg 提交于 3月 26, 2020

If we know that we have an encrypted link (based on having had
a key configured for TX in the past) then drop all data frames
in the key selection handler if there's no key anymore.

This fixes an issue with mac80211 internal TXQs - there we can
buffer frames for an encrypted link, but then if the key is no
longer there when they're dequeued, the frames are sent without
encryption. This happens if a station is disconnected while the
frames are still on the TXQ.

Detecting that a link should be encrypted based on a first key
having been configured for TX is fine as there are no use cases
for a connection going from with encryption to no encryption.
With extended key IDs, however, there is a case of having a key
configured for only decryption, so we can't just trigger this
behaviour on a key being configured.

Cc: stable@vger.kernel.org
Reported-by: NJouni Malinen <j@w1.fi>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20200326150855.6865c7f28a14.I9fb1d911b064262d33e33dfba730cdeef83926ca@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

a0761a30

devlink: Rely on driver eswitch thread safety instead of devlink · 98fed6eb

由 Parav Pandit 提交于 2月 23, 2020

devlink_nl_cmd_eswitch_set_doit() doesn't hold devlink->lock mutex while
invoking driver callback. This is likely due to eswitch mode setting
involves adding/remove devlink ports, health reporters or
other devlink objects for a devlink device.

So it is driver responsiblity to ensure thread safe eswitch state
transition happening via either sriov legacy enablement or via devlink
eswitch set callback.

Therefore, get() callback should also be invoked without holding
devlink->lock mutex.
Vendor driver can use same internal lock which it uses during eswitch
mode set() callback.
This makes get() and set() implimentation symmetric in devlink core and
in vendor drivers.

Hence, remove holding devlink->lock mutex during eswitch get() callback.

Failing to do so results into below deadlock scenario when mlx5_core
driver is improved to handle eswitch mode set critical section invoked
by devlink and sriov sysfs interface in subsequent patch.

devlink_nl_cmd_eswitch_set_doit()
   mlx5_eswitch_mode_set()
     mutex_lock(esw->mode_lock) <- Lock A
     [...]
     register_devlink_port()
       mutex_lock(&devlink->lock); <- lock B

mutex_lock(&devlink->lock); <- lock B
devlink_nl_cmd_eswitch_get_doit()
   mlx5_eswitch_mode_get()
   mutex_lock(esw->mode_lock) <- Lock A

In subsequent patch, mlx5_core driver uses its internal lock during
get() and set() eswitch callbacks.

Other drivers have been inspected which returns either constant during
get operations or reads the value from already allocated structure.
Hence it is safe to remove the lock in get( ) callback and let vendor
driver handle it.
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

98fed6eb

net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build · 2c64605b

由 Pablo Neira Ayuso 提交于 3月 25, 2020

net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
    net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’
      pkt->skb->tc_redirected = 1;
              ^~
    net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’
      pkt->skb->tc_from_ingress = 1;
              ^~

To avoid a direct dependency with tc actions from netfilter, wrap the
redirect bits around CONFIG_NET_REDIRECT and move helpers to
include/linux/skbuff.h. Turn on this toggle from the ifb driver, the
only existing client of these bits in the tree.

This patch adds skb_set_redirected() that sets on the redirected bit
on the skbuff, it specifies if the packet was redirect from ingress
and resets the timestamp (timestamp reset was originally missing in the
netfilter bugfix).

Fixes: bcfabee1 ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress")
Reported-by: noreply@ellerman.id.au
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c64605b

net: use indirect call wrappers for skb_copy_datagram_iter() · 29f3490b

由 Eric Dumazet 提交于 3月 24, 2020

TCP recvmsg() calls skb_copy_datagram_iter(), which
calls an indirect function (cb pointing to simple_copy_to_iter())
for every MSS (fragment) present in the skb.

CONFIG_RETPOLINE=y forces a very expensive operation
that we can avoid thanks to indirect call wrappers.

This patch gives a 13% increase of performance on
a single flow, if the bottleneck is the thread reading
the TCP socket.

Fixes: 950fcaec ("datagram: consolidate datagram copy to iter helpers")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29f3490b

25 3月, 2020 9 次提交

nl80211: fix NL80211_ATTR_CHANNEL_WIDTH attribute type · 0016d320

由 Johannes Berg 提交于 3月 25, 2020

The new opmode notification used this attribute with a u8, when
it's documented as a u32 and indeed used in userspace as such,
it just happens to work on little-endian systems since userspace
isn't doing any strict size validation, and the u8 goes into the
lower byte. Fix this.

Cc: stable@vger.kernel.org
Fixes: 466b9936 ("cfg80211: Add support to notify station's opmode change to userspace")
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20200325090531.be124f0a11c7.Iedbf4e197a85471ebd729b186d5365c0343bf7a8@changeidSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

0016d320

ethtool: fix incorrect tx-checksumming settings reporting · 9d648fb5

由 Vladyslav Tarasiuk 提交于 3月 24, 2020

Currently, ethtool feature mask for checksum command is ORed with
NETIF_F_FCOE_CRC_BIT, which is bit's position number, instead of the
actual feature bit - NETIF_F_FCOE_CRC.

The invalid bitmask here might affect unrelated features when toggling
TX checksumming. For example, TX checksumming is always mistakenly
reported as enabled on the netdevs tested (mlx5, virtio_net).

Fixes: f70bb065 ("ethtool: update mapping of features to legacy ioctl requests")
Signed-off-by: NVladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d648fb5

net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_pop · e80f40cb

由 Vladimir Oltean 提交于 3月 24, 2020

Not only did this wheel did not need reinventing, but there is also
an issue with it: It doesn't remove the VLAN header in a way that
preserves the L2 payload checksum when that is being provided by the DSA
master hw. It should recalculate checksum both for the push, before
removing the header, and for the pull afterwards. But the current
implementation is quite dizzying, with pulls followed immediately
afterwards by pushes, the memmove is done before the push, etc. This
makes a DSA master with RX checksumming offload to print stack traces
with the infamous 'hw csum failure' message.

So remove the dsa_8021q_remove_header function and replace it with
something that actually works with inet checksumming.

Fixes: d4619336 ("net: dsa: tag_8021q: Create helper function for removing VLAN header")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e80f40cb

net: cbs: Fix software cbs to consider packet sending time · 961d0e5b

由 Zh-yuan Ye 提交于 3月 24, 2020

Currently the software CBS does not consider the packet sending time
when depleting the credits. It caused the throughput to be
Idleslope[kbps] * (Port transmit rate[kbps] / |Sendslope[kbps]|) where
Idleslope * (Port transmit rate / (Idleslope + |Sendslope|)) = Idleslope
is expected. In order to fix the issue above, this patch takes the time
when the packet sending completes into account by moving the anchor time
variable "last" ahead to the send completion time upon transmission and
adding wait when the next dequeue request comes before the send
completion time of the previous packet.

changelog:
V2->V3:
 - remove unnecessary whitespace cleanup
 - add the checks if port_rate is 0 before division

V1->V2:
 - combine variable "send_completed" into "last"
 - add the comment for estimate of the packet sending

Fixes: 585d763a ("net/sched: Introduce Credit Based Shaper (CBS) qdisc")
Signed-off-by: NZh-yuan Ye <ye.zh-yuan@socionext.com>
Reviewed-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

961d0e5b

netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress · bcfabee1

由 Pablo Neira Ayuso 提交于 3月 23, 2020

Set skb->tc_redirected to 1, otherwise the ifb driver drops the packet.
Set skb->tc_from_ingress to 1 to reinject the packet back to the ingress
path after leaving the ifb egress path.

This patch inconditionally sets on these two skb fields that are
meaningful to the ifb driver. The existing forward action is guaranteed
to run from ingress path.

Fixes: 39e6dea2 ("netfilter: nf_tables: add forward expression to the netdev family")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

bcfabee1

netfilter: nft_fwd_netdev: validate family and chain type · 76a109fa

由 Pablo Neira Ayuso 提交于 3月 23, 2020

Make sure the forward action is only used from ingress.

Fixes: 39e6dea2 ("netfilter: nf_tables: add forward expression to the netdev family")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

76a109fa

netfilter: nft_set_rbtree: Detect partial overlaps on insertion · 7c84d414

由 Stefano Brivio 提交于 3月 22, 2020

...and return -ENOTEMPTY to the front-end in this case, instead of
proceeding. Currently, nft takes care of checking for these cases
and not sending them to the kernel, but if we drop the set_overlap()
call in nft we can end up in situations like:

 # nft add table t
 # nft add set t s '{ type inet_service ; flags interval ; }'
 # nft add element t s '{ 1 - 5 }'
 # nft add element t s '{ 6 - 10 }'
 # nft add element t s '{ 4 - 7 }'
 # nft list set t s
 table ip t {
 	set s {
 		type inet_service
 		flags interval
 		elements = { 1-3, 4-5, 6-7 }
 	}
 }

This change has the primary purpose of making the behaviour
consistent with nft_set_pipapo, but is also functional to avoid
inconsistent behaviour if userspace sends overlapping elements for
any reason.

v2: When we meet the same key data in the tree, as start element while
    inserting an end element, or as end element while inserting a start
    element, actually check that the existing element is active, before
    resetting the overlap flag (Pablo Neira Ayuso)
Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7c84d414

netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start() · 6f7c9caf

由 Stefano Brivio 提交于 3月 22, 2020

Replace negations of nft_rbtree_interval_end() with a new helper,
nft_rbtree_interval_start(), wherever this helps to visualise the
problem at hand, that is, for all the occurrences except for the
comparison against given flags in __nft_rbtree_get().

This gets especially useful in the next patch.
Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

6f7c9caf

netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion · 0eb4b5ee

由 Stefano Brivio 提交于 3月 22, 2020

...and return -ENOTEMPTY to the front-end on collision, -EEXIST if
an identical element already exists. Together with the previous patch,
element collision will now be returned to the user as -EEXIST.
Reported-by: NPhil Sutter <phil@nwl.cc>
Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

0eb4b5ee

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功