提交 · a31fcbceef37f7a502b8dc70e2c2767e68232e74 · openeuler / Kernel

18 6月, 2021 2 次提交

driver core: add a helper to setup both the of_node and fwnode of a device · 43e76d46

由 Ioana Ciornei 提交于 6月 17, 2021

There are many places where both the fwnode_handle and the of_node of a
device need to be populated. Add a function which does both so that we
have consistency.
Suggested-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NIoana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43e76d46

net: fix mistake path for netdev_features_strings · 2d8ea148

由 Jian Shen 提交于 6月 17, 2021

Th_strings arrays netdev_features_strings, tunable_strings, and
phy_tunable_strings has been moved to file net/ethtool/common.c.
So fixes the comment.
Signed-off-by: NJian Shen <shenjian15@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d8ea148

16 6月, 2021 1 次提交

bpf: Support socket migration by eBPF. · d5e4ddae

由 Kuniyuki Iwashima 提交于 6月 12, 2021

This patch introduces a new bpf_attach_type for BPF_PROG_TYPE_SK_REUSEPORT
to check if the attached eBPF program is capable of migrating sockets. When
the eBPF program is attached, we run it for socket migration if the
expected_attach_type is BPF_SK_REUSEPORT_SELECT_OR_MIGRATE or
net.ipv4.tcp_migrate_req is enabled.

Currently, the expected_attach_type is not enforced for the
BPF_PROG_TYPE_SK_REUSEPORT type of program. Thus, this commit follows the
earlier idea in the commit aac3fc32 ("bpf: Post-hooks for sys_bind") to
fix up the zero expected_attach_type in bpf_prog_load_fixup_attach_type().

Moreover, this patch adds a new field (migrating_sk) to sk_reuseport_md to
select a new listener based on the child socket. migrating_sk varies
depending on if it is migrating a request in the accept queue or during
3WHS.

  - accept_queue : sock (ESTABLISHED/SYN_RECV)
  - 3WHS         : request_sock (NEW_SYN_RECV)

In the eBPF program, we can select a new listener by
BPF_FUNC_sk_select_reuseport(). Also, we can cancel migration by returning
SK_DROP. This feature is useful when listeners have different settings at
the socket API level or when we want to free resources as soon as possible.

  - SK_PASS with selected_sk, select it as a new listener
  - SK_PASS with selected_sk NULL, fallbacks to the random selection
  - SK_DROP, cancel the migration.

There is a noteworthy point. We select a listening socket in three places,
but we do not have struct skb at closing a listener or retransmitting a
SYN+ACK. On the other hand, some helper functions do not expect skb is NULL
(e.g. skb_header_pointer() in BPF_FUNC_skb_load_bytes(), skb_tail_pointer()
in BPF_FUNC_skb_load_bytes_relative()). So we allocate an empty skb
temporarily before running the eBPF program.
Suggested-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/netdev/20201123003828.xjpjdtk4ygl6tg6h@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/netdev/20201203042402.6cskdlit5f3mw4ru@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/netdev/20201209030903.hhow5r53l6fmozjn@kafai-mbp.dhcp.thefacebook.com/
Link: https://lore.kernel.org/bpf/20210612123224.12525-10-kuniyu@amazon.co.jp

d5e4ddae

15 6月, 2021 5 次提交

net/mlx5: Enlarge interrupt field in CREATE_EQ · 3af26495

由 Shay Drory 提交于 5月 10, 2021

FW is now supporting more than 256 MSI-X per PF (up to 2K).
Hence, enlarge interrupt field in CREATE_EQ to make use of the new
MSI-X's.
Signed-off-by: NShay Drory <shayd@nvidia.com>
Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

3af26495

net/mlx5: Provide cpumask at EQ creation phase · e4e3f24b

由 Leon Romanovsky 提交于 2月 23, 2021

The users of EQ are running their code on different CPUs and with
various affinity patterns. Move the cpumask setting close to their
actual usage.
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Reviewed-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

e4e3f24b

net: phy: micrel: ksz886x/ksz8081: add cabletest support · 49011e0c

由 Oleksij Rempel 提交于 6月 14, 2021

This patch support for cable test for the ksz886x switches and the
ksz8081 PHY.

The patch was tested on a KSZ8873RLL switch with following results:

- port 1:
  - provides invalid values, thus return -ENOTSUPP
    (Errata: DS80000830A: "LinkMD does not work on Port 1",
     http://ww1.microchip.com/downloads/en/DeviceDoc/KSZ8873-Errata-DS80000830A.pdf)

- port 2:
  - can detect distance
  - can detect open on each wire of pair A (wire 1 and 2)
  - can detect open only on one wire of pair B (only wire 3)
  - can detect short between wires of a pair (wires 1 + 2 or 3 + 6)
  - short between pairs is detected as open.
    For example short between wires 2 + 3 is detected as open.
Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

49011e0c

net: phy/dsa micrel/ksz886x add MDI-X support · 52939393

由 Oleksij Rempel 提交于 6月 14, 2021

Add support for MDI-X status and configuration
Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52939393

net: phy: micrel: move phy reg offsets to common header · ec4b94f9

由 Michael Grzeschik 提交于 6月 14, 2021

Some micrel devices share the same PHY register defines. This patch
moves them to one common header so other drivers can reuse them.
And reuse generic MII_* defines where possible.
Signed-off-by: NMichael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec4b94f9

13 6月, 2021 3 次提交

net: qualcomm: rmnet: trailer value is a checksum · be754f64

由 Alex Elder 提交于 6月 12, 2021

The csum_value field in the rmnet_map_dl_csum_trailer structure is a
"real" Internet checksum.  It is a 16 bit value, in big endian format,
which represents an inverted ones' complement sum over pairs of bytes.

Make that clear by changing its type to __sum16.

This makes a typecast in rmnet_map_ipv4_dl_csum_trailer() and
another in rmnet_map_ipv6_dl_csum_trailer() unnecessary.
Signed-off-by: NAlex Elder <elder@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be754f64

wwan: add interface creation support · 88b71053

由 Johannes Berg 提交于 6月 12, 2021

Add support to create (and destroy) interfaces via a new
rtnetlink kind "wwan". The responsible driver has to use
the new wwan_register_ops() to make this possible.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NSergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: NLoic Poulain <loic.poulain@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88b71053

net: phy: Add 25G BASE-R interface mode · a56c2868

由 Steen Hegelund 提交于 6月 11, 2021

Add 25gbase-r phy interface mode
Signed-off-by: NSteen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: NBjarni Jonasson <bjarni.jonasson@microchip.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a56c2868

12 6月, 2021 18 次提交

net: pcs: xpcs: export xpcs_do_config and xpcs_link_up · a853c68e

由 Vladimir Oltean 提交于 6月 11, 2021

The sja1105 hardware has a quirk in that some changes require a switch
reset, which loses all configuration. When the reset is initiated,
everything needs to be reprogrammed, including the MACs and the PCS.
This is currently done in sja1105_static_config_reload() - we manually
call sja1105_adjust_port_config(), sja1105_sgmii_pcs_config() and
sja1105_sgmii_pcs_force_speed() which are all internal functions.

There is a desire for sja1105 to use the common xpcs driver, and that
means that the equivalents of those functions, xpcs_do_config() and
xpcs_link_up() respectively, will no longer be local functions.

Forcing phylink to retrigger a resolve somehow, say by doing dev_close()
followed by dev_open() is not really an option, because the CPU port
might have a PCS as well, and there is no net device which we can close
and reopen for that. Additionally, the dev_close/dev_open sequence might
force a renegotiation of the copper-side link for SGMII ports connected
to a PHY, and this is undesirable as well, because the switch reset is
much quicker than a PHY autoneg, so we would have a lot more downtime.

The only solution I see is for the sja1105 driver to keep doing what
it's doing, and that means we need to export the equivalents from xpcs
for sja1105_sgmii_pcs_config and sja1105_sgmii_pcs_force_speed, and call
them directly in sja1105_static_config_reload(). This will be done
during the conversion patch.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a853c68e

net: pcs: xpcs: add support for NXP SJA1110 · f7380bba

由 Vladimir Oltean 提交于 6月 11, 2021

The NXP SJA1110 switch integrates its own, non-Synopsys PMA, but it
manages it through the register space of the XPCS itself, in a small
register window inside MDIO_MMD_VEND2 from address 0x8030 to 0x806e.

This coincides with where the registers for the default Synopsys PMA
are, but the register definitions are of course not the same.

This situation is an odd hardware quirk, but the simplest way to manage
it is to drive the SJA1110's PMA from within the XPCS driver.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7380bba

net: pcs: xpcs: add support for NXP SJA1105 · dd0721ea

由 Vladimir Oltean 提交于 6月 11, 2021

The NXP SJA1105 DSA switch integrates a Synopsys SGMII XPCS on port 4.
The generic code works fine, except there is an integration issue which
needs to be dealt with: in this switch, the XPCS is integrated with a
PMA that has the TX lane polarity inverted by default (PLUS is MINUS,
MINUS is PLUS).

To obtain normal non-inverted behavior, the TX lane polarity must be
inverted in the PCS, via the DIGITAL_CONTROL_2 register.

We introduce a pma_config() method in xpcs_compat which is called by the
phylink_pcs_config() implementation.

Also, the NXP SJA1105 returns all zeroes in the PHY ID registers 2 and 3.
We need to hack up an ad-hoc PHY ID (OUI is zero, device ID is 1) in
order for the XPCS driver to recognize it. This PHY ID is added to the
public include/linux/pcs/pcs-xpcs.h for that reason (for the sja1105
driver to be able to use it in a later patch).
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dd0721ea

net: pcs: xpcs: rename mdio_xpcs_args to dw_xpcs · 5673ef86

由 Vladimir Oltean 提交于 6月 11, 2021

The struct mdio_xpcs_args is reminiscent of when a similarly named
struct mdio_xpcs_ops existed. Now that that is removed, we can shorten
the name to dw_xpcs (dw for DesignWare).
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5673ef86

virtio/vsock: rest of SOCK_SEQPACKET support · 9ac841f5

由 Arseny Krasnov 提交于 6月 11, 2021

Small updates to make SOCK_SEQPACKET work:
1) Send SHUTDOWN on socket close for SEQPACKET type.
2) Set SEQPACKET packet type during send.
3) Set 'VIRTIO_VSOCK_SEQ_EOR' bit in flags for last
   packet of message.
4) Implement data check function for SEQPACKET.
5) Check for max datagram size.
Signed-off-by: NArseny Krasnov <arseny.krasnov@kaspersky.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ac841f5

virtio/vsock: dequeue callback for SOCK_SEQPACKET · 44931195

由 Arseny Krasnov 提交于 6月 11, 2021

Callback fetches RW packets from rx queue of socket until whole record
is copied(if user's buffer is full, user is not woken up). This is done
to not stall sender, because if we wake up user and it leaves syscall,
nobody will send credit update for rest of record, and sender will wait
for next enter of read syscall at receiver's side. So if user buffer is
full, we just send credit update and drop data.
Signed-off-by: NArseny Krasnov <arseny.krasnov@kaspersky.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44931195

net: phylink: introduce phylink_fwnode_phy_connect() · 25396f68

由 Calvin Johnson 提交于 6月 11, 2021

Define phylink_fwnode_phy_connect() to connect phy specified by
a fwnode to a phylink instance.
Signed-off-by: NCalvin Johnson <calvin.johnson@oss.nxp.com>
Signed-off-by: NIoana Ciornei <ioana.ciornei@nxp.com>
Acked-by: NGrant Likely <grant.likely@arm.com>
Reviewed-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25396f68

net: mdio: Add ACPI support code for mdio · 803ca24d

由 Calvin Johnson 提交于 6月 11, 2021

Define acpi_mdiobus_register() to Register mii_bus and create PHYs for
each ACPI child node.
Signed-off-by: NCalvin Johnson <calvin.johnson@oss.nxp.com>
Signed-off-by: NIoana Ciornei <ioana.ciornei@nxp.com>
Acked-by: NRafael J. Wysocki <rafael@kernel.org>
Acked-by: NGrant Likely <grant.likely@arm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

803ca24d

ACPI: utils: Introduce acpi_get_local_address() · 7ec16433

由 Calvin Johnson 提交于 6月 11, 2021

Introduce a wrapper around the _ADR evaluation.
Signed-off-by: NCalvin Johnson <calvin.johnson@oss.nxp.com>
Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: NIoana Ciornei <ioana.ciornei@nxp.com>
Acked-by: NRafael J. Wysocki <rafael@kernel.org>
Acked-by: NGrant Likely <grant.likely@arm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ec16433

net: mdiobus: Introduce fwnode_mdiobus_register_phy() · bc1bee3b

由 Calvin Johnson 提交于 6月 11, 2021

Introduce fwnode_mdiobus_register_phy() to register PHYs on the
mdiobus. From the compatible string, identify whether the PHY is
c45 and based on this create a PHY device instance which is
registered on the mdiobus.

Along with fwnode_mdiobus_register_phy() also introduce
fwnode_find_mii_timestamper() and fwnode_mdiobus_phy_device_register()
since they are needed.
While at it, also use the newly introduced fwnode operation in
of_mdiobus_phy_device_register().
Signed-off-by: NCalvin Johnson <calvin.johnson@oss.nxp.com>
Signed-off-by: NIoana Ciornei <ioana.ciornei@nxp.com>
Acked-by: NGrant Likely <grant.likely@arm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc1bee3b

net: phy: Introduce fwnode_get_phy_id() · 114dea60

由 Calvin Johnson 提交于 6月 11, 2021

Extract phy_id from compatible string. This will be used by
fwnode_mdiobus_register_phy() to create phy device using the
phy_id.
Signed-off-by: NCalvin Johnson <calvin.johnson@oss.nxp.com>
Signed-off-by: NIoana Ciornei <ioana.ciornei@nxp.com>
Acked-by: NGrant Likely <grant.likely@arm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

114dea60

net: phy: Introduce phy related fwnode functions · 425775ed

由 Calvin Johnson 提交于 6月 11, 2021

Define fwnode_phy_find_device() to iterate an mdiobus and find the
phy device of the provided phy fwnode. Additionally define
device_phy_find_device() to find phy device of provided device.

Define fwnode_get_phy_node() to get phy_node using named reference.
Signed-off-by: NCalvin Johnson <calvin.johnson@oss.nxp.com>
Signed-off-by: NIoana Ciornei <ioana.ciornei@nxp.com>
Acked-by: NGrant Likely <grant.likely@arm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

425775ed

net: phy: Introduce fwnode_mdio_find_device() · 0fb16976

由 Calvin Johnson 提交于 6月 11, 2021

Define fwnode_mdio_find_device() to get a pointer to the
mdio_device from fwnode passed to the function.

Refactor of_mdio_find_device() to use fwnode_mdio_find_device().
Signed-off-by: NCalvin Johnson <calvin.johnson@oss.nxp.com>
Signed-off-by: NIoana Ciornei <ioana.ciornei@nxp.com>
Acked-by: NGrant Likely <grant.likely@arm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0fb16976

net: dsa: sja1105: implement TX timestamping for SJA1110 · 566b18c8

由 Vladimir Oltean 提交于 6月 11, 2021

The TX timestamping procedure for SJA1105 is a bit unconventional
because the transmit procedure itself is unconventional.

Control packets (and therefore PTP as well) are transmitted to a
specific port in SJA1105 using "management routes" which must be written
over SPI to the switch. These are one-shot rules that match by
destination MAC address on traffic coming from the CPU port, and select
the precise destination port for that packet. So to transmit a packet
from NET_TX softirq context, we actually need to defer to a process
context so that we can perform that SPI write before we send the packet.
The DSA master dev_queue_xmit() runs in process context, and we poll
until the switch confirms it took the TX timestamp, then we annotate the
skb clone with that TX timestamp. This is why the sja1105 driver does
not need an skb queue for TX timestamping.

But the SJA1110 is a bit (not much!) more conventional, and you can
request 2-step TX timestamping through the DSA header, as well as give
the switch a cookie (timestamp ID) which it will give back to you when
it has the timestamp. So now we do need a queue for keeping the skb
clones until their TX timestamps become available.

The interesting part is that the metadata frames from SJA1105 haven't
disappeared completely. On SJA1105 they were used as follow-ups which
contained RX timestamps, but on SJA1110 they are actually TX completion
packets, which contain a variable (up to 32) array of timestamps.
Why an array? Because:
- not only is the TX timestamp on the egress port being communicated,
  but also the RX timestamp on the CPU port. Nice, but we don't care
  about that, so we ignore it.
- because a packet could be multicast to multiple egress ports, each
  port takes its own timestamp, and the TX completion packet contains
  the individual timestamps on each port.

This is unconventional because switches typically have a timestamping
FIFO and raise an interrupt, but this one doesn't. So the tagger needs
to detect and parse meta frames, and call into the main switch driver,
which pairs the timestamps with the skbs in the TX timestamping queue
which are waiting for one.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

566b18c8

net: dsa: add support for the SJA1110 native tagging protocol · 4913b8eb

由 Vladimir Oltean 提交于 6月 11, 2021

The SJA1110 has improved a few things compared to SJA1105:

- To send a control packet from the host port with SJA1105, one needed
  to program a one-shot "management route" over SPI. This is no longer
  true with SJA1110, you can actually send "in-band control extensions"
  in the packets sent by DSA, these are in fact DSA tags which contain
  the destination port and switch ID.

- When receiving a control packet from the switch with SJA1105, the
  source port and switch ID were written in bytes 3 and 4 of the
  destination MAC address of the frame (which was a very poor shot at a
  DSA header). If the control packet also had an RX timestamp, that
  timestamp was sent in an actual follow-up packet, so there were
  reordering concerns on multi-core/multi-queue DSA masters, where the
  metadata frame with the RX timestamp might get processed before the
  actual packet to which that timestamp belonged (there is no way to
  pair a packet to its timestamp other than the order in which they were
  received). On SJA1110, this is no longer true, control packets have
  the source port, switch ID and timestamp all in the DSA tags.

- Timestamps from the switch were partial: to get a 64-bit timestamp as
  required by PTP stacks, one would need to take the partial 24-bit or
  32-bit timestamp from the packet, then read the current PTP time very
  quickly, and then patch in the high bits of the current PTP time into
  the captured partial timestamp, to reconstruct what the full 64-bit
  timestamp must have been. That is awful because packet processing is
  done in NAPI context, but reading the current PTP time is done over
  SPI and therefore needs sleepable context.

But it also aggravated a few things:

- Not only is there a DSA header in SJA1110, but there is a DSA trailer
  in fact, too. So DSA needs to be extended to support taggers which
  have both a header and a trailer. Very unconventional - my understanding
  is that the trailer exists because the timestamps couldn't be prepared
  in time for putting them in the header area.

- Like SJA1105, not all packets sent to the CPU have the DSA tag added
  to them, only control packets do:

  * the ones which match the destination MAC filters/traps in
    MAC_FLTRES1 and MAC_FLTRES0
  * the ones which match FDB entries which have TRAP or TAKETS bits set

  So we could in theory hack something up to request the switch to take
  timestamps for all packets that reach the CPU, and those would be
  DSA-tagged and contain the source port / switch ID by virtue of the
  fact that there needs to be a timestamp trailer provided. BUT:

- The SJA1110 does not parse its own DSA tags in a way that is useful
  for routing in cross-chip topologies, a la Marvell. And the sja1105
  driver already supports cross-chip bridging from the SJA1105 days.
  It does that by automatically setting up the DSA links as VLAN trunks
  which contain all the necessary tag_8021q RX VLANs that must be
  communicated between the switches that span the same bridge. So when
  using tag_8021q on sja1105, it is possible to have 2 switches with
  ports sw0p0, sw0p1, sw1p0, sw1p1, and 2 VLAN-unaware bridges br0 and
  br1, and br0 can take sw0p0 and sw1p0, and br1 can take sw0p1 and
  sw1p1, and forwarding will happen according to the expected rules of
  the Linux bridge.
  We like that, and we don't want that to go away, so as a matter of
  fact, the SJA1110 tagger still needs to support tag_8021q.

So the sja1110 tagger is a hybrid between tag_8021q for data packets,
and the native hardware support for control packets.

On RX, packets have a 13-byte trailer if they contain an RX timestamp.
That trailer is padded in such a way that its byte 8 (the start of the
"residence time" field - not parsed by Linux because we don't care) is
aligned on a 16 byte boundary. So the padding has a variable length
between 0 and 15 bytes. The DSA header contains the offset of the
beginning of the padding relative to the beginning of the frame (and the
end of the padding is obviously the end of the packet minus 13 bytes,
the length of the trailer). So we discard it.

Packets which don't have a trailer contain the source port and switch ID
information in the header (they are "trap-to-host" packets). Packets
which have a trailer contain the source port and switch ID in the trailer.

On TX, the destination port mask and switch ID is always in the trailer,
so we always need to say in the header that a trailer is present.

The header needs a custom EtherType and this was chosen as 0xdadc, after
0xdada which is for Marvell and 0xdadb which is for VLANs in
VLAN-unaware mode on SJA1105 (and SJA1110 in fact too).

Because we use tag_8021q in concert with the native tagging protocol,
control packets will have 2 DSA tags.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4913b8eb

net: dsa: sja1105: make SJA1105_SKB_CB fit a full timestamp · 617ef8d9

由 Vladimir Oltean 提交于 6月 11, 2021

In SJA1105, RX timestamps for packets sent to the CPU are transmitted in
separate follow-up packets (metadata frames). These contain partial
timestamps (24 or 32 bits) which are kept in SJA1105_SKB_CB(skb)->meta_tstamp.

Thankfully, SJA1110 improved that, and the RX timestamps are now
transmitted in-band with the actual packet, in the timestamp trailer.
The RX timestamps are now full-width 64 bits.

Because we process the RX DSA tags in the rcv() method in the tagger,
but we would like to preserve the DSA code structure in that we populate
the skb timestamp in the port_rxtstamp() call which only happens later,
the implication is that we must somehow pass the 64-bit timestamp from
the rcv() method all the way to port_rxtstamp(). We can use the skb->cb
for that.

Rename the meta_tstamp from struct sja1105_skb_cb from "meta_tstamp" to
"tstamp", and increase its size to 64 bits.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

617ef8d9

net: dsa: tag_8021q: refactor RX VLAN parsing into a dedicated function · 233697b3

由 Vladimir Oltean 提交于 6月 11, 2021

The added value of this function is that it can deal with both the case
where the VLAN header is in the skb head, as well as in the offload field.
This is something I was not able to do using other functions in the
network stack.

Since both ocelot-8021q and sja1105 need to do the same stuff, let's
make it a common service provided by tag_8021q.

This is done as refactoring for the new SJA1110 tagger, which partly
uses tag_8021q as well (just like SJA1105), and will be the third caller.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

233697b3

net: dsa: tag_8021q: remove shim declarations · ab6a303c

由 Vladimir Oltean 提交于 6月 11, 2021

All users of tag_8021q select it in Kconfig, so shim functions are not
needed because it is not possible for it to be disabled and its callers
enabled.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab6a303c

11 6月, 2021 1 次提交

ice: add low level PTP clock access functions · 03cb4473

由 Jacob Keller 提交于 6月 09, 2021

Add the ice_ptp_hw.c file and some associated definitions to the ice
driver folder. This file contains basic low level definitions for
functions that interact with the device hardware.

For now, only E810-based devices are supported. The ice hardware
supports 2 major variants which have different PHYs with different
procedures necessary for interacting with the device clock.

Because the device captures timestamps in the PHY, each PHY has its own
internal timer. The timers are synchronized in hardware by first
preparing the source timer and the PHY timer shadow registers, and then
issuing a synchronization command. This ensures that both the source
timer and PHY timers are programmed simultaneously. The timers
themselves are all driven from the same oscillator source.

The functions in ice_ptp_hw.c abstract over the differences between how
the PHYs in E810 are programmed vs how the PHYs in E822 devices are
programmed. This series only implements E810 support, but E822 support
will be added in a future change.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

03cb4473

10 6月, 2021 4 次提交

net/mlx5: Bridge, add offload infrastructure · 19e9bfa0

由 Vlad Buslov 提交于 4月 02, 2021

Create new files bridge.{c|h} in en/rep directory that implement bridge
interaction with representor netdevices and handle required
events/notifications, bridge.{c|h} in esw directory that implement all
necessary eswitch offloading infrastructure and works on vport/eswitch
level. Provide new kconfig MLX5_BRIDGE which is automatically selected when
both kernel bridge and mlx5 eswitch configs are enabled.

Provide basic infrastructure for bridge offloads:

- struct mlx5_esw_bridge_offloads - per-eswitch bridge offload structure
that encapsulates generic bridge-offloads data (notifier blocks, ingress
flow table/group, etc.) that is created/deleted on enable/disable eswitch
offloads.

- struct mlx5_esw_bridge - per-bridge structure that encapsulates
per-bridge data (reference counter, FDB, egress flow table/group, etc.)
that is created when first eswitch represetor is attached to new bridge and
deleted when last representor is removed from the bridge as a result of
NETDEV_CHANGEUPPER event.

The bridge tables are created with new priority FDB_BR_OFFLOAD in FDB
namespace. The new priority is between tc-miss and slow path priorities.
Priority consist of two levels: the ingress table that is global per
eswitch and matches incoming packets by src_mac/vid and redirects them to
next level (egress table) that is chosen according to ingress port bridge
membership and matches on dst_mac/vid in order to redirect packet to vport
according to the following diagram:

                +
                |
      +---------v----------+
      |                    |
      |   FDB_TC_OFFLOAD   |
      |                    |
      +---------+----------+
                |
                |
      +---------v----------+
      |                    |
      |   FDB_FT_OFFLOAD   |
      |                    |
      +---------+----------+
                |
                |
      +---------v----------+
      |                    |
      |    FDB_TC_MISS     |
      |                    |
      +---------+----------+
                |
+--------------------------------------+
|               |                      |
|        +------+                      |
|        |                             |
| +------v--------+   FDB_BR_OFFLOAD   |
| | INGRESS_TABLE |                    |
| +------+---+----+                    |
|        |   |      match              |
|        |   +---------+               |
|        |             |               |    +-------+
|        |     +-------v-------+ match |    |       |
|        |     | EGRESS_TABLE  +------------> vport |
|        |     +-------+-------+       |    |       |
|        |             |               |    +-------+
|        |    miss     |               |
|        +------+------+               |
|               |                      |
+--------------------------------------+
                |
                |
      +---------v----------+
      |                    |
      |   FDB_SLOW_PATH    |
      |                    |
      +---------+----------+
                |
                v
Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
Reviewed-by: NJianbo Liu <jianbol@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

19e9bfa0

net/mlx5: Create TC-miss priority and table · ec3be887

由 Vlad Buslov 提交于 3月 04, 2021

In order to adhere to kernel software datapath model bridge offloads must
come after TC and NF FDBs. Following patches in this series add new FDB
priority for bridge after FDB_FT_OFFLOAD. However, since netfilter offload
is implemented with unmanaged tables, its miss path is not automatically
connected to next priority and requires the code to manually connect with
slow table. To keep bridge offloads encapsulated and not mix it with
eswitch offloads, create a new FDB_TC_MISS priority between FDB_FT_OFFLOAD
and FDB_SLOW_PATH:

          +
          |
+---------v----------+
|                    |
|   FDB_TC_OFFLOAD   |
|                    |
+---------+----------+
          |
          |
          |
+---------v----------+
|                    |
|   FDB_FT_OFFLOAD   |
|                    |
+---------+----------+
          |
          |
          |
+---------v----------+
|                    |
|    FDB_TC_MISS     |
|                    |
+---------+----------+
          |
          |
          |
+---------v----------+
|                    |
|   FDB_SLOW_PATH    |
|                    |
+---------+----------+
          |
          v

Initialize the new priority with single default empty managed table and use
the table as TC/NF miss patch instead of slow table. This approach allows
bridge offloads to be created as new FDB namespace priority between
FDB_TC_MISS and FDB_SLOW_PATH without exposing its internal tables to any
other modules since miss path of managed TC-miss table is automatically
wired to next priority.
Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
Reviewed-by: NJianbo Liu <jianbol@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

ec3be887

net/mlx5: Added new parameters to reformat context · 3f3f05ab

由 Yevgeny Kliteynik 提交于 3月 09, 2021

Adding new reformat context type (INSERT_HEADER) requires adding two new
parameters to reformat context - reformat_param_0 and reformat_param_1.
As defined by HW spec, these parameters have different meaning for
different reformat context type.

The first parameter (reformat_param_0) is not new to HW spec, but it
wasn't used by any of the supported reformats. The second parameter
(reformat_param_1) is new to the HW spec - it was added to allow
supporting INSERT_HEADER.

For NSERT_HEADER, reformat_param_0 indicates the header used to
reference the location of the inserted header, and reformat_param_1
indicates the offset of the inserted header from the reference point
defined by reformat_param_0.
Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

3f3f05ab

net/mlx5: mlx5_ifc support for header insert/remove · 67133eaa

由 Yevgeny Kliteynik 提交于 3月 09, 2021

Add support for HCA caps 2 that contains capabilities for the new
insert/remove header actions.

Added the required definitions for supporting the new reformat type:
added packet reformat parameters, reformat anchors and definitions
to allow copy/set into the inserted EMD (Embedded MetaData) tag.
Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
Reviewed-by: NJianbo Liu <jianbol@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

67133eaa

09 6月, 2021 4 次提交

net: stmmac: explicitly deassert GMAC_AHB_RESET · e67f325e

由 Matthew Hagan 提交于 6月 08, 2021

We are currently assuming that GMAC_AHB_RESET will already be deasserted
by the bootloader. However if this has not been done, probing of the GMAC
will fail. To remedy this we must ensure GMAC_AHB_RESET has been deasserted
prior to probing.

v2 changes:
 - remove NULL condition check for stmmac_ahb_rst in stmmac_main.c
 - unwrap dev_err() message in stmmac_main.c
 - add PTR_ERR() around plat->stmmac_ahb_rst in stmmac_platform.c

v3 changes:
 - add error pointer to dev_err() output
 - add reset_control_assert(stmmac_ahb_rst) in stmmac_dvr_remove
 - revert PTR_ERR() around plat->stmmac_ahb_rst since this is performed
   on the returned value of ret by the calling function
Signed-off-by: NMatthew Hagan <mnhagan88@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e67f325e

net: wwan: make WWAN_PORT_MAX meaning less surprised · b64d76b7

由 Sergey Ryazanov 提交于 6月 08, 2021

It is quite unusual when some value can not be equal to a defined range
max value. Also most subsystems defines FOO_TYPE_MAX as a maximum valid
value. So turn the WAN_PORT_MAX meaning from the number of supported
port types to the maximum valid port type.
Signed-off-by: NSergey Ryazanov <ryazanov.s.a@gmail.com>
Reviewed-by: NLoic Poulain <loic.poulain@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b64d76b7

net: stmmac: enable Intel mGbE 2.5Gbps link speed · 46682cb8

由 Voon Weifeng 提交于 6月 08, 2021

The Intel mGbE supports 2.5Gbps link speed by increasing the clock rate by
2.5 times of the original rate. In this mode, the serdes/PHY operates at a
serial baud rate of 3.125 Gbps and the PCS data path and GMII interface of
the MAC operate at 312.5 MHz instead of 125 MHz.

For Intel mGbE, the overclocking of 2.5 times clock rate to support 2.5G is
only able to be configured in the BIOS during boot time. Kernel driver has
no access to modify the clock rate for 1Gbps/2.5G mode. The way to
determined the current 1G/2.5G mode is by reading a dedicated adhoc
register through mdio bus. In short, after the system boot up, it is either
in 1G mode or 2.5G mode which not able to be changed on the fly.

Compared to 1G mode, the 2.5G mode selects the 2500BASEX as PHY interface and
disables the xpcs_an_inband. This is to cater for some PHYs that only
supports 2500BASEX PHY interface with no autonegotiation.

v2: remove MAC supported link speed masking
v3: Restructure to introduce intel_speed_mode_2500() to read serdes registers
for max speed supported and select the appropritate configuration.
Use max_speed to determine the supported link speed mask.
Signed-off-by: NVoon Weifeng <weifeng.voon@intel.com>
Signed-off-by: NMichael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46682cb8

net: pcs: add 2500BASEX support for Intel mGbE controller · f27abde3

由 Voon Weifeng 提交于 6月 08, 2021

XPCS IP supports 2500BASEX as PHY interface. It is configured as
autonegotiation disable to cater for PHYs that does not supports 2500BASEX
autonegotiation.

v2: Add supported link speed masking.
v3: Restructure to introduce xpcs_config_2500basex() used to configure the
    xpcs for 2.5G speeds. Added 2500BASEX specific information for
    configuration.
v4: Fix indentation error
Signed-off-by: NVoon Weifeng <weifeng.voon@intel.com>
Signed-off-by: NMichael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f27abde3

08 6月, 2021 2 次提交

page_pool: Allow drivers to hint on SKB recycling · 6a5bcd84

由 Ilias Apalodimas 提交于 6月 07, 2021

Up to now several high speed NICs have custom mechanisms of recycling
the allocated memory they use for their payloads.
Our page_pool API already has recycling capabilities that are always
used when we are running in 'XDP mode'. So let's tweak the API and the
kernel network stack slightly and allow the recycling to happen even
during the standard operation.
The API doesn't take into account 'split page' policies used by those
drivers currently, but can be extended once we have users for that.

The idea is to be able to intercept the packet on skb_release_data().
If it's a buffer coming from our page_pool API recycle it back to the
pool for further usage or just release the packet entirely.

To achieve that we introduce a bit in struct sk_buff (pp_recycle:1) and
a field in struct page (page->pp) to store the page_pool pointer.
Storing the information in page->pp allows us to recycle both SKBs and
their fragments.
We could have skipped the skb bit entirely, since identical information
can bederived from struct page. However, in an effort to affect the free path
as less as possible, reading a single bit in the skb which is already
in cache, is better that trying to derive identical information for the
page stored data.

The driver or page_pool has to take care of the sync operations on it's own
during the buffer recycling since the buffer is, after opting-in to the
recycling, never unmapped.

Since the gain on the drivers depends on the architecture, we are not
enabling recycling by default if the page_pool API is used on a driver.
In order to enable recycling the driver must call skb_mark_for_recycle()
to store the information we need for recycling in page->pp and
enabling the recycling bit, or page_pool_store_mem_info() for a fragment.
Co-developed-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: NMatteo Croce <mcroce@microsoft.com>
Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
Signed-off-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a5bcd84

skbuff: add a parameter to __skb_frag_unref · c420c989

由 Matteo Croce 提交于 6月 07, 2021

This is a prerequisite patch, the next one is enabling recycling of
skbs and fragments. Add an extra argument on __skb_frag_unref() to
handle recycling, and update the current users of the function with that.
Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c420c989

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功