提交 · 4a84182afc1d35e4e3d0c57fb1836f0bd33706f5 · openeuler / Kernel

28 2月, 2020 28 次提交

dpaa2-eth: add support for mii ioctls · 4a84182a

由 Russell King 提交于 2月 27, 2020

Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Acked-by: NIoana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a84182a

net: phy: marvell10g: read copper results from CSSR1 · c84786fa

由 Russell King 提交于 2月 27, 2020

Read the copper autonegotiation results from the copper specific
status register, rather than decoding the advertisements. Reading
what the link is actually doing will allow us to support downshift
modes.
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c84786fa

Merge branch 's390-qeth-next' · be64e397

由 David S. Miller 提交于 2月 27, 2020

Julian Wiedmann says:

====================
s390/qeth: updates 2020-02-27

please apply the following patch series for qeth to netdev's net-next
tree.

This adds support for ETHTOOL_RX_COPYBREAK, along with small cleanups
and fine-tuning.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be64e397

s390/qeth: support configurable RX copybreak · 562cf773

由 Julian Wiedmann 提交于 2月 27, 2020

Implement the ethtool hooks for the ETHTOOL_RX_COPYBREAK tunable.

The copybreak is stored into netdev_priv, so that we automatically go
back to the default value if the netdev is re-allocated.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

562cf773

s390/qeth: don't check for IFF_UP when scheduling napi · 3d35dbe6

由 Julian Wiedmann 提交于 2月 27, 2020

Trust the napi_disable() in qeth_stop() to handle this.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d35dbe6

s390/qeth: don't re-start read cmd when IDX has terminated · 3a5bad64

由 Julian Wiedmann 提交于 2月 27, 2020

Once the IDX connection is down, there's no point in trying to issue
more IOs.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a5bad64

s390/qeth: reset seqnos on connection startup · 7f23d55f

由 Julian Wiedmann 提交于 2月 27, 2020

This let's us start every new IDX connection with clean seqnos.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f23d55f

s390/qeth: remove unused cmd definitions · d74e5e84

由 Julian Wiedmann 提交于 2月 27, 2020

Looks like these were never used, ever since the driver was initially
added.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: NAlexandra Winter <wintera@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d74e5e84

s390/qeth: validate device-provided MAC address · 13bf8295

由 Julian Wiedmann 提交于 2月 27, 2020

It's good practice to not blindly trust what the HW offers.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: NAlexandra Winter <wintera@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13bf8295

s390/qeth: clean up CREATE_ADDR cmd code · 9c6dc7af

由 Julian Wiedmann 提交于 2月 27, 2020

Properly define the cmd's struct to get rid of some casts and accesses
at magic offsets.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: NAlexandra Winter <wintera@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c6dc7af

s390/qeth: remove dead code in qeth_l3_iqd_read_initial_mac() · 6bbfece5

由 Julian Wiedmann 提交于 2月 27, 2020

card->info.unique_id is always 0 for IQD devices, so don't bother with
copying it into the 0-initialized cmd.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: NAlexandra Winter <wintera@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6bbfece5

Merge branch 'selftests-updates-for-mlxsw-driver-test' · 22339f2f

由 David S. Miller 提交于 2月 27, 2020

Jiri Pirko says:

====================
selftests: updates for mlxsw driver test

This patchset contains tweaks to the existing tests and is also adding
couple of new ones, namely tests for shared buffer and red offload.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22339f2f

selftests: mlxsw: resource_scale: Invoke for Spectrum-3 · 3eba4137

由 Amit Cohen 提交于 2月 27, 2020

The scale test for Spectrum-2 should be invoked for Spectrum-2 and
Spectrum-3. Add the appropriate device ID.
Signed-off-by: NAmit Cohen <amitc@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3eba4137

selftests: mlxsw: Reduce router scale running time using offload indication · e781eeda

由 Danielle Ratson 提交于 2月 27, 2020

Currently, the test inserts X /32 routes and for each route it is
testing that a packet sent from the first host is received by the second
host, which is very time-consuming.

Instead only validate the offload flag of each route and get the same result.

Wait between the creation of the routes and the offload validation in
order to make sure that all the routes were successfully offloaded.
Signed-off-by: NDanielle Ratson <danieller@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e781eeda

selftests: mlxsw: Reduce running time using offload indication · abfce9e0

由 Danielle Ratson 提交于 2月 27, 2020

After adding a given number of flower rules for different IPv6
addresses, the test generates traffic and ensures that each packet is
received, which is time-consuming.

Instead, test the offload indication of the tc flower rules and reduce
the running time by half.
Signed-off-by: NDanielle Ratson <danieller@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

abfce9e0

selftests: mlxsw: Add shared buffer traffic test · a865ad99

由 Shalom Toledo 提交于 2月 27, 2020

Test the max shared buffer occupancy for port's pool and port's TC's (using
different types of packets).
Signed-off-by: NShalom Toledo <shalomt@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a865ad99

selftests: mlxsw: Add mlxsw lib · 4240dbd8

由 Shalom Toledo 提交于 2月 27, 2020

Add mlxsw lib for common defines, helpers etc.
Signed-off-by: NShalom Toledo <shalomt@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4240dbd8

selftests: devlink_lib: Add devlink port helpers · 9fb74734

由 Shalom Toledo 提交于 2月 27, 2020

Add two devlink port helpers:
 * devlink port get by netdev
 * devlink cpu port get
Signed-off-by: NShalom Toledo <shalomt@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fb74734

selftests: devlink_lib: Check devlink info command is supported · 552ec3d9

由 Shalom Toledo 提交于 2月 27, 2020

Sanity check for devlink info command.
Signed-off-by: NShalom Toledo <shalomt@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

552ec3d9

selftests: mlxsw: Add shared buffer configuration test · 6697b51e

由 Shalom Toledo 提交于 2月 27, 2020

Test physical ports' shared buffer configuration options using random
values related to a specific configuration option. There are 3
configuration options: pool, TC bind and portpool.

Each sub-test, test a different configuration option and random the related
values as the follow:
 * For pools, pool's size will be randomized.
 * For TC bind, pool number and threshold will be randomized.
 * For portpools, threshold will be randomized.
Signed-off-by: NShalom Toledo <shalomt@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6697b51e

selftests: mlxsw: Use busywait helper in rtnetlink test · 1cbe65e0

由 Danielle Ratson 提交于 2月 27, 2020

Rtnetlink test uses offload indication checks.

Use a busywait helper and wait until the offload indication is set or
fail if it reaches timeout.
Signed-off-by: NDanielle Ratson <danieller@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1cbe65e0

selftests: mlxsw: Use busywait helper in vxlan test · 05ef614c

由 Danielle Ratson 提交于 2月 27, 2020

Vxlan test uses offload indication checks.

Use a busywait helper and wait until the offload indication is set or
fail if it reaches timeout.
Signed-off-by: NDanielle Ratson <danieller@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05ef614c

selftests: mlxsw: Use busywait helper in blackhole routes test · 0c22f993

由 Danielle Ratson 提交于 2月 27, 2020

Blackhole routes test uses offload indication checks.

Use busywait helper and wait until the routes offload indication is set or
fail if it reaches timeout.
Signed-off-by: NDanielle Ratson <danieller@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c22f993

selftests: devlink_trap_l3_drops: Avoid race condition · 5d66773f

由 Ido Schimmel 提交于 2月 27, 2020

The test checks that packets are trapped when they should egress a
router interface (RIF) that has become disabled. This is a temporary
state in a RIF's deletion sequence.

Currently, the test deletes the RIF by flushing all the IP addresses
configured on the associated netdev (br0). However, this is racy, as
this also flushes all the routes pointing to the netdev and if the
routes are deleted from the device before the RIF is disabled, then no
packets will try to egress the disabled RIF and the trap will not be
triggered.

Instead, trigger the deletion of the RIF by unlinking the mlxsw port
from the bridge that is backing the RIF. Unlike before, this will not
cause the kernel to delete the routes pointing to the bridge.

Note that due to current mlxsw locking scheme the RIF is always deleted
first, but this is going to change.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d66773f

selftests: add a mirror test to mlxsw tc flower restrictions · ab2b8ab2

由 Jiri Pirko 提交于 2月 27, 2020

Include test of forbidding to have multiple mirror actions.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab2b8ab2

selftests: add egress redirect test to mlxsw tc flower restrictions · c84e903f

由 Jiri Pirko 提交于 2月 27, 2020

Include test of forbidding to have redirect rule on egress-bound block.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c84e903f

selftests: mlxsw: Add a RED selftest · 3de611b5

由 Petr Machata 提交于 2月 27, 2020

This tests that below the queue minimum length, there is no dropping /
marking, and above max, everything is dropped / marked.

The test is structured as a core file with topology and test code, and
three wrappers: one for RED used as a root Qdisc, and two for
testing (W)RED under PRIO and ETS.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3de611b5

selftests: forwarding: lib.sh: Add start_tcp_traffic · 4113b048

由 Petr Machata 提交于 2月 27, 2020

Extract a helper __start_traffic() configurable by protocol type. Allow
passing through extra mausezahn arguments. Add a wrapper,
start_tcp_traffic().
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4113b048

27 2月, 2020 12 次提交

Merge branch 'VLANs-DSA-switches-and-multiple-bridges' · 2b99e54b

由 David S. Miller 提交于 2月 26, 2020

Russell King says:

====================
VLANs, DSA switches and multiple bridges

This is a repost of the previously posted RFC back in December, which
did not get fully reviewed.  I've dropped the RFC tag this time as no
one really found anything too problematical in the RFC posting.

I've been trying to configure DSA for VLANs and not having much success.
The setup is quite simple:

- The main network is untagged
- The wifi network is a vlan tagged with id $VN running over the main
  network.

I have an Armada 388 Clearfog with a PCIe wifi card which I'm trying to
setup to provide wifi access to the vlan $VN network, while the switch
is also part of the main network.

However, I'm encountering problems:

1) vlan support in DSA has a different behaviour from the Linux
   software bridge implementation.

    # bridge vlan
    port    vlan ids
    lan1     1 PVID Egress Untagged
    ...

   shows the default setup - the bridge ports are all configured for
   vlan 1, untagged egress, and vlan 1 as the port vid.  Issuing:

    # ip li set dev br0 type bridge vlan_filtering 1

   with no other vlan configuration commands on a Linux software bridge
   continues to allow untagged traffic to flow across the bridge.

   This difference in behaviour is because the MV88E6xxx VTU is
   completely empty - because net/dsa ignores all vlan settings for
   a port if br_vlan_enabled(dp->bridge_dev) is false - this reflects
   the vlan filtering state of the bridge, not whether the bridge is
   vlan aware.

   What this means is that attempting to configure the bridge port
   vlans before enabling vlan filtering works for Linux software
   bridges, but fails for DSA bridges.

2) Assuming the above is sorted, we move on to the next issue, which
   is altogether more weird.  Let's take a setup where we have a
   DSA bridge with lan1..6 in a bridge device, br0, with vlan
   filtering enabled.  lan1 is the upstream port, lan2 is a downstream
   port that also wants to see traffic on vlan id $VN.

   Both lan1 and lan2 are configured for that:

     # bridge vlan add vid $VN dev lan1
     # bridge vlan add vid $VN dev lan2
     # ip li set br0 type bridge vlan_filtering 1

   Untagged traffic can now pass between all the six lan ports, and
   vlan $VN between lan1 and lan2 only.  The MV88E6xxx 8021q_mode
   debugfs file shows all lan ports are in mode "secure" - this is
   important!  /sys/class/net/br0/bridge/vlan_filtering contains 1.

   tcpdumping from another machine on lan4 shows that no $VN traffic
   reaches it.  Everything seems to be working correctly...

   In order to further bridge vlan $VN traffic to hostapd's wifi
   interface, things get a little more complex - we can't add hostapd's
   wifi interface to br0 directly, because hostapd will bring up the
   wifi interface and leak the main, untagged traffic onto the wifi.
   (hostapd does have vlan support, but only as a dynamic per-client
   thing, and there's no hooks I can see to allow script-based config
   of the network setup before hostapd up's the wifi interface.)

   So, what I tried was:

     # ip li add link br0 name br0.$VN type vlan id $VN
     # bridge vlan add vid $VN dev br0 self
     # ip li set dev br0.$VN up

   So far so good, we get a vlan interface on top of the bridge, and
   tcpdumping it shows we get traffic.  The 8021q_mode file has not
   changed state.  Everything still seems to be correct.

     # bridge addbr br1

   Still nothing has changed.

     # bridge addif br1 br0.$VN

   And now the 8021q_mode debugfs file shows that all ports are now in
   "disabled" mode, but /sys/class/net/br0/bridge/vlan_filtering still
   contains '1'.  In other words, br0 still thinks vlan filtering is
   enabled, but the hardware has had vlan filtering disabled.

   Adding some stack traces to an appropriate point indicates that this
   is because __switchdev_handle_port_attr_set() recurses down through
   the tree of interfaces, skipping over the vlan interface, applying
   br1's configuration to br0's ports.

   This surely can not be right - surely
   __switchdev_handle_port_attr_set() and similar should stop recursing
   down through another master bridge device?  There are probably other
   network device classes that switchdev shouldn't recurse down too.

   I've considered whether switchdev is the right level to do it, and
   I think it is - as we want the check/set callbacks to be called for
   the top level device even if it is a master bridge device, but we
   don't want to recurse through a lower master bridge device.

v2: dropped patch 3, since that has an outstanding issue, and my
question on it has not been answered.  Otherwise, these are the
same patches.  Maybe we can move forward with just these two?

v3: include DSA ports in patch 2
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b99e54b

net: dsa: mv88e6xxx: fix duplicate vlan warning · 933b4425

由 Russell King 提交于 2月 26, 2020

When setting VLANs on DSA switches, the VLAN is added to both the port
concerned as well as the CPU port by dsa_slave_vlan_add(), as well as
any DSA ports.  If multiple ports are configured with the same VLAN ID,
this triggers a warning on the CPU and DSA ports.

Avoid this warning for CPU and DSA ports.
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

933b4425

net: switchdev: do not propagate bridge updates across bridges · 07c6f980

由 Russell King 提交于 2月 26, 2020

When configuring a tree of independent bridges, propagating changes
from the upper bridge across a bridge master to the lower bridge
ports brings surprises.

For example, a lower bridge may have vlan filtering enabled.  It
may have a vlan interface attached to the bridge master, which may
then be incorporated into another bridge.  As soon as the lower
bridge vlan interface is attached to the upper bridge, the lower
bridge has vlan filtering disabled.

This occurs because switchdev recursively applies its changes to
all lower devices no matter what.
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Tested-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07c6f980

net: qrtr: Fix error pointer vs NULL bugs · 9baeea50

由 Dan Carpenter 提交于 2月 26, 2020

The callers only expect NULL pointers, so returning an error pointer
will lead to an Oops.

Fixes: 0c2204a4 ("net: qrtr: Migrate nameservice to kernel from userspace")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9baeea50

net: phy: mscc: add missing shift for media operation mode selection · 1ac7b090

由 Antoine Tenart 提交于 2月 26, 2020

This patch adds a missing shift for the media operation mode selection.
This does not fix the driver as the current operation mode (copper) has
a value of 0, but this wouldn't work for other modes.
Signed-off-by: NAntoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ac7b090

net: ena: fix broken interface between ENA driver and FW · 92040c6d

由 Arthur Kiyanovski 提交于 2月 26, 2020

In this commit we revert the part of
commit 1a63443a ("net/amazon: Ensure that driver version is aligned to the linux kernel"),
which breaks the interface between the ENA driver and FW.

We also replace the use of DRIVER_VERSION with DRIVER_GENERATION
when we bring back the deleted constants that are used in interface with
ENA device FW.

This commit does not change the driver version reported to the user via
ethtool, which remains the kernel version.

Fixes: 1a63443a ("net/amazon: Ensure that driver version is aligned to the linux kernel")
Signed-off-by: NArthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92040c6d

Merge branch 'mptcp-update-mptcp-ack-sequence-outside-of-recv-path' · 621135a0

由 David S. Miller 提交于 2月 26, 2020

Florian Westphal says:

====================
mptcp: update mptcp ack sequence outside of recv path

This series moves mptcp-level ack sequence update outside of the recvmsg path.
Current approach has two problems:

1. There is delay between arrival of new data and the time we can ack
   this data.
2. If userspace doesn't call recv for some time, mptcp ack_seq is not
   updated at all, even if this data is queued in the subflow socket
   receive queue.

Move skbs from the subflow socket receive queue to the mptcp-level
receive queue, updating the mptcp-level ack sequence and have recv
take skbs from the mptcp-level receive queue.

The first place where we will attempt to update the mptcp level acks
is from the subflows' data_ready callback, even before we make userspace
aware of new data.

Because of possible deadlock (we need to take the mptcp socket lock
while already holding the subflow sockets lock), we may still need to
defer the mptcp-level ack update.  In such case, this work will be either
done from work queue or recv path, depending on which runs sooner.

In order to avoid pointless scheduling of the work queue, work
will be queued from the mptcp sockets lock release callback.
This allows to detect when the socket owner did drain the subflow
socket receive queue.

Please see individual patches for more information.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

621135a0

mptcp: defer work schedule until mptcp lock is released · 14c441b5

由 Paolo Abeni 提交于 2月 26, 2020

Don't schedule the work queue right away, instead defer this
to the lock release callback.

This has the advantage that it will give recv path a chance to
complete -- this might have moved all pending packets from the
subflow to the mptcp receive queue, which allows to avoid the
schedule_work().
Co-developed-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14c441b5

mptcp: avoid work queue scheduling if possible · 2e52213c

由 Florian Westphal 提交于 2月 26, 2020

We can't lock_sock() the mptcp socket from the subflow data_ready callback,
it would result in ABBA deadlock with the subflow socket lock.

We can however grab the spinlock: if that succeeds and the mptcp socket
is not owned at the moment, we can process the new skbs right away
without deferring this to the work queue.

This avoids the schedule_work and hence the small delay until the
work item is processed.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e52213c

mptcp: remove mptcp_read_actor · bfae9dae

由 Florian Westphal 提交于 2月 26, 2020

Only used to discard stale data from the subflow, so move
it where needed.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfae9dae

mptcp: add rmem queue accounting · 600911ff

由 Florian Westphal 提交于 2月 26, 2020

If userspace never drains the receive buffers we must stop draining
the subflow socket(s) at some point.

This adds the needed rmem accouting for this.
If the threshold is reached, we stop draining the subflows.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

600911ff

mptcp: update mptcp ack sequence from work queue · 6771bfd9

由 Florian Westphal 提交于 2月 26, 2020

If userspace is not reading data, all the mptcp-level acks contain the
ack_seq from the last time userspace read data rather than the most
recent in-sequence value.

This causes pointless retransmissions for data that is already queued.

The reason for this is that all the mptcp protocol level processing
happens at mptcp_recv time.

This adds work queue to move skbs from the subflow sockets receive
queue on the mptcp socket receive queue (which was not used so far).

This allows us to announce the correct mptcp ack sequence in a timely
fashion, even when the application does not call recv() on the mptcp socket
for some time.

We still wake userspace tasks waiting for POLLIN immediately:
If the mptcp level receive queue is empty (because the work queue is
still pending) it can be filled from in-sequence subflow sockets at
recv time without a need to wait for the worker.

The skb_orphan when moving skbs from subflow to mptcp level is needed,
because the destructor (sock_rfree) relies on skb->sk (ssk!) lock
being taken.

A followup patch will add needed rmem accouting for the moved skbs.

Other problem: In case application behaves as expected, and calls
recv() as soon as mptcp socket becomes readable, the work queue will
only waste cpu cycles.  This will also be addressed in followup patches.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6771bfd9

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功