提交 · 043cd1e204a02735228a4bcc1ef094b347b360bf · openeuler / Kernel

10 12月, 2022 1 次提交

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · 043cd1e2

由 Jakub Kicinski 提交于 12月 09, 2022

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-12-08 (ice)

Jacob Keller says:

This series of patches primarily consists of changes to fix some corner
cases that can cause Tx timestamp failures. The issues were discovered and
reported by Siddaraju DH and primarily affect E822 hardware, though this
series also includes some improvements that affect E810 hardware as well.

The primary issue is regarding the way that E822 determines when to generate
timestamp interrupts. If the driver reads timestamp indexes which do not
have a valid timestamp, the E822 interrupt tracking logic can get stuck.
This is due to the way that E822 hardware tracks timestamp index reads
internally. I was previously unaware of this behavior as it is significantly
different in E810 hardware.

Most of the fixes target refactors to ensure that the ice driver does not
read timestamp indexes which are not valid on E822 hardware. This is done by
using the Tx timestamp ready bitmap register from the PHY. This register
indicates what timestamp indexes have outstanding timestamps waiting to be
captured.

Care must be taken in all cases where we read the timestamp registers, and
thus all flows which might have read these registers are refactored. The
ice_ptp_tx_tstamp function is modified to consolidate as much of the logic
relating to these registers as possible. It now handles discarding stale
timestamps which are old or which occurred after a PHC time update. This
replaces previously standalone thread functions like the periodic work
function and the ice_ptp_flush_tx_tracker function.

In addition, some minor cleanups noticed while writing these refactors are
included.

The remaining patches refactor the E822 implementation to remove the
"bypass" mode for timestamps. The E822 hardware has the ability to provide a
more precise timestamp by making use of measurements of the precise way that
packets flow through the hardware pipeline. These measurements are known as
"Vernier" calibration. The "bypass" mode disables many of these measurements
in favor of a faster start up time for Tx and Rx timestamping. Instead, once
these measurements were captured, the driver tries to reconfigure the PHY to
enable the vernier calibrations.

Unfortunately this recalibration does not work. Testing indicates that the
PHY simply remains in bypass mode without the increased timestamp precision.
Remove the attempt at recalibration and always use vernier mode. This has
one disadvantage that Tx and Rx timestamps cannot begin until after at least
one packet of that type goes through the hardware pipeline. Because of this,
further refactor the driver to separate Tx and Rx vernier calibration.
Complete the Tx and Rx independently, enabling the appropriate type of
timestamp as soon as the relevant packet has traversed the hardware
pipeline. This was reported by Milena Olech.

Note that although these might be considered "bug fixes", the required
changes in order to appropriately resolve these issues is large. Thus it
does not feel suitable to send this series to net.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: reschedule ice_ptp_wait_for_offset_valid during reset
ice: make Tx and Rx vernier offset calibration independent
ice: only check set bits in ice_ptp_flush_tx_tracker
ice: handle flushing stale Tx timestamps in ice_ptp_tx_tstamp
ice: cleanup allocations in ice_ptp_alloc_tx_tracker
ice: protect init and calibrating check in ice_ptp_request_ts
ice: synchronize the misc IRQ when tearing down Tx tracker
ice: check Tx timestamp memory register for ready timestamps
ice: handle discarding old Tx requests in ice_ptp_tx_tstamp
ice: always call ice_ptp_link_change and make it void
ice: fix misuse of "link err" with "link status"
ice: Reset TS memory for all quads
ice: Remove the E822 vernier "bypass" logic
ice: Use more generic names for ice_ptp_tx fields
====================

Link: https://lore.kernel.org/r/20221208213932.1274143-1-anthony.l.nguyen@intel.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

043cd1e2

09 12月, 2022 39 次提交

net: openvswitch: Add support to count upcall packets · 1933ea36

由 wangchuanlei 提交于 12月 06, 2022

Add support to count upall packets, when kmod of openvswitch
upcall to count the number of packets for upcall succeed and
failed, which is a better way to see how many packets upcalled
on every interfaces.
Signed-off-by: Nwangchuanlei <wangchuanlei@inspur.com>
Acked-by: NEelco Chaudron <echaudro@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1933ea36

rhashtable: Allow rhashtable to be used from irq-safe contexts · e47877c7

由 Tejun Heo 提交于 12月 06, 2022

rhashtable currently only does bh-safe synchronization making it impossible
to use from irq-safe contexts. Switch it to use irq-safe synchronization to
remove the restriction.

v2: Update the lock functions to return the ulong flags value and unlock
    functions to take the value directly instead of passing around the
    pointer. Suggested by Linus.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NDavid Vernet <dvernet@meta.com>
Acked-by: NJosh Don <joshdon@google.com>
Acked-by: NHao Luo <haoluo@google.com>
Acked-by: NBarret Rhoden <brho@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e47877c7

Merge branch 'net-sched-retpoline' · b602d003

由 David S. Miller 提交于 12月 09, 2022

Pedro Tammela says:

====================
net/sched: retpoline wrappers for tc

In tc all qdics, classifiers and actions can be compiled as modules.
This results today in indirect calls in all transitions in the tc hierarchy.
Due to CONFIG_RETPOLINE, CPUs with mitigations=on might pay an extra cost on
indirect calls. For newer Intel cpus with IBRS the extra cost is
nonexistent, but AMD Zen cpus and older x86 cpus still go through the
retpoline thunk.

Known built-in symbols can be optimized into direct calls, thus
avoiding the retpoline thunk. So far, tc has not been leveraging this
build information and leaving out a performance optimization for some
CPUs. In this series we wire up 'tcf_classify()' and 'tcf_action_exec()'
with direct calls when known modules are compiled as built-in as an
opt-in optimization.

We measured these changes in one AMD Zen 4 cpu (Retpoline), one AMD Zen 3 cpu (Retpoline),
one Intel 10th Gen CPU (IBRS), one Intel 3rd Gen cpu (Retpoline) and one
Intel Xeon CPU (IBRS) using pktgen with 64b udp packets. Our test setup is a
dummy device with clsact and matchall in a kernel compiled with every
tc module as built-in.  We observed a 3-8% speed up on the retpoline CPUs,
when going through 1 tc filter, and a 60-100% speed up when going through 100 filters.
For the IBRS cpus we observed a 1-2% degradation in both scenarios, we believe
the extra branches check introduced a small overhead therefore we added
a static key that bypasses the wrapper on kernels not using the retpoline mitigation,
but compiled with CONFIG_RETPOLINE.

1 filter:
CPU        | before (pps) | after (pps) | diff
R9 7950X   | 5914980      | 6380227     | +7.8%
R9 5950X   | 4237838      | 4412241     | +4.1%
R9 5950X   | 4265287      | 4413757     | +3.4%   [*]
i5-3337U   | 1580565      | 1682406     | +6.4%
i5-10210U  | 3006074      | 3006857     | +0.0%
i5-10210U  | 3160245      | 3179945     | +0.6%   [*]
Xeon 6230R | 3196906      | 3197059a     | +0.0%
Xeon 6230R | 3190392      | 3196153     | +0.01%  [*]

100 filters:
CPU        | before (pps) | after (pps) | diff
R9 7950X   | 373598       | 820396      | +119.59%
R9 5950X   | 313469       | 633303      | +102.03%
R9 5950X   | 313797       | 633150      | +101.77% [*]
i5-3337U   | 127454       | 211210      | +65.71%
i5-10210U  | 389259       | 381765      | -1.9%
i5-10210U  | 408812       | 412730      | +0.9%    [*]
Xeon 6230R | 415420       | 406612      | -2.1%
Xeon 6230R | 416705       | 405869      | -2.6%    [*]

[*] In these tests we ran pktgen with clone set to 1000.

On the 7950x system we also tested the impact of filters if iteration order
placement varied, first by compiling a kernel with the filter under test being
the first one in the static iteration and then repeating it with being last (of 15 classifiers existing today).
We saw a difference of +0.5-1% in pps between being the first in the iteration vs being the last.
Therefore we order the classifiers and actions according to relevance per our current thinking.

v5->v6:
- Address Eric Dumazet suggestions

v4->v5:
- Rebase

v3->v4:
- Address Eric Dumazet suggestions

v2->v3:
- Address suggestions by Jakub, Paolo and Eric
- Dropped RFC tag (I forgot to add it on v2)

v1->v2:
- Fix build errors found by the bots
- Address Kuniyuki Iwashima suggestions

====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b602d003

net/sched: avoid indirect classify functions on retpoline kernels · 9f3101dc

由 Pedro Tammela 提交于 12月 06, 2022

Expose the necessary tc classifier functions and wire up cls_api to use
direct calls in retpoline kernels.
Signed-off-by: NPedro Tammela <pctammela@mojatatu.com>
Reviewed-by: NJamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: NVictor Nogueira <victor@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f3101dc

net/sched: avoid indirect act functions on retpoline kernels · 871cf386

由 Pedro Tammela 提交于 12月 06, 2022

Expose the necessary tc act functions and wire up act_api to use
direct calls in retpoline kernels.
Signed-off-by: NPedro Tammela <pctammela@mojatatu.com>
Reviewed-by: NJamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: NVictor Nogueira <victor@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

871cf386

net/sched: add retpoline wrapper for tc · 7f0e8102

由 Pedro Tammela 提交于 12月 06, 2022

On kernels using retpoline as a spectrev2 mitigation,
optimize actions and filters that are compiled as built-ins into a direct call.

On subsequent patches we expose the classifiers and actions functions
and wire up the wrapper into tc.
Signed-off-by: NPedro Tammela <pctammela@mojatatu.com>
Reviewed-by: NJamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: NVictor Nogueira <victor@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f0e8102

net/sched: move struct action_ops definition out of ifdef · 2a7d228f

由 Pedro Tammela 提交于 12月 06, 2022

The type definition should be visible even in configurations not using
CONFIG_NET_CLS_ACT.
Signed-off-by: NPedro Tammela <pctammela@mojatatu.com>
Reviewed-by: NJamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: NVictor Nogueira <victor@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2a7d228f

net: phy: remove redundant "depends on" lines · 0bdff115

由 Randy Dunlap 提交于 12月 06, 2022

Delete a few lines of "depends on PHYLIB" since they are inside
an "if PHYLIB / endif # PHYLIB" block, i.e., they are redundant
and the other 50+ drivers there don't use "depends on PHYLIB"
since it is not needed.
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Link: https://lore.kernel.org/r/20221207044257.30036-1-rdunlap@infradead.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

0bdff115

net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCP · b534dc46

由 Willem de Bruijn 提交于 12月 07, 2022

Add an option to initialize SOF_TIMESTAMPING_OPT_ID for TCP from
write_seq sockets instead of snd_una.

This should have been the behavior from the start. Because processes
may now exist that rely on the established behavior, do not change
behavior of the existing option, but add the right behavior with a new
flag. It is encouraged to always set SOF_TIMESTAMPING_OPT_ID_TCP on
stream sockets along with the existing SOF_TIMESTAMPING_OPT_ID.

Intuitively the contract is that the counter is zero after the
setsockopt, so that the next write N results in a notification for
the last byte N - 1.

On idle sockets snd_una == write_seq and this holds for both. But on
sockets with data in transmission, snd_una records the unacked offset
in the stream. This depends on the ACK response from the peer. A
process cannot learn this in a race free manner (ioctl SIOCOUTQ is one
racy approach).

write_seq records the offset at the last byte written by the process.
This is a better starting point. It matches the intuitive contract in
all circumstances, unaffected by external behavior.

The new timestamp flag necessitates increasing sk_tsflags to 32 bits.
Move the field in struct sock to avoid growing the socket (for some
common CONFIG variants). The UAPI interface so_timestamping.flags is
already int, so 32 bits wide.
Reported-by: NSotirios Delimanolis <sotodel@meta.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20221207143701.29861-1-willemdebruijn.kernel@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

b534dc46

Merge branch 'fix-possible-deadlock-during-wed-attach' · ecd6df3c

由 Jakub Kicinski 提交于 12月 08, 2022

Lorenzo Bianconi says:

====================
fix possible deadlock during WED attach

Fix a possible deadlock in mtk_wed_attach if mtk_wed_wo_init routine fails.
Check wo pointer is properly allocated before running mtk_wed_wo_reset() and
mtk_wed_wo_deinit().
====================

Link: https://lore.kernel.org/r/cover.1670421354.git.lorenzo@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

ecd6df3c

net: ethernet: mtk_wed: fix possible deadlock if mtk_wed_wo_init fails · 587585e1

由 Lorenzo Bianconi 提交于 12月 07, 2022

Introduce __mtk_wed_detach() in order to avoid a deadlock in
mtk_wed_attach routine if mtk_wed_wo_init fails since both
mtk_wed_attach and mtk_wed_detach run holding hw_lock mutex.

Fixes: 4c5de09e ("net: ethernet: mtk_wed: add configure wed wo support")
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

587585e1

net: ethernet: mtk_wed: fix some possible NULL pointer dereferences · c79e0af5

由 Lorenzo Bianconi 提交于 12月 07, 2022

Fix possible NULL pointer dereference in mtk_wed_detach routine checking
wo pointer is properly allocated before running mtk_wed_wo_reset() and
mtk_wed_wo_deinit().
Even if it is just a theoretical issue at the moment check wo pointer is
not NULL in mtk_wed_mcu_msg_update.
Moreover, honor mtk_wed_mcu_send_msg return value in mtk_wed_wo_reset()

Fixes: 79968444 ("net: ethernet: mtk_wed: introduce wed wo support")
Fixes: 4c5de09e ("net: ethernet: mtk_wed: add configure wed wo support")
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

c79e0af5

nfp: Fix spelling mistake "tha" -> "the" · 3df96774

由 Colin Ian King 提交于 12月 07, 2022

There is a spelling mistake in a nn_dp_warn message. Fix it.
Signed-off-by: NColin Ian King <colin.i.king@gmail.com>
Reviewed-by: NSimon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20221207094312.2281493-1-colin.i.king@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

3df96774

selftests: net: Fix O=dir builds · 17961a37

由 Björn Töpel 提交于 12月 06, 2022

The BPF Makefile in net/bpf did incorrect path substitution for O=dir
builds, e.g.

  make O=/tmp/kselftest headers
  make O=/tmp/kselftest -C tools/testing/selftests

would fail in selftest builds [1] net/ with

  clang-16: error: no such file or directory: 'kselftest/net/bpf/nat6to4.c'
  clang-16: error: no input files

Add a pattern prerequisite and an order-only-prerequisite (for
creating the directory), to resolve the issue.

[1] https://lore.kernel.org/all/202212060009.34CkQmCN-lkp@intel.com/Reported-by: Nkernel test robot <lkp@intel.com>
Fixes: 837a3d66 ("selftests: net: Add cross-compilation support for BPF programs")
Signed-off-by: NBjörn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20221206102838.272584-1-bjorn@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

17961a37

Merge branch 'mlxsw-add-spectrum-1-ip6gre-support' · ce87a957

由 Jakub Kicinski 提交于 12月 08, 2022

Petr Machata says:

====================
mlxsw: Add Spectrum-1 ip6gre support

Ido Schimmel writes:

Currently, mlxsw only supports ip6gre offload on Spectrum-2 and newer
ASICs. Spectrum-1 can also offload ip6gre tunnels, but it needs double
entry router interfaces (RIFs) for the RIFs representing these tunnels.
In addition, the RIF index needs to be even. This is handled in
patches #1-#3.

The implementation can otherwise be shared between all Spectrum
generations. This is handled in patches #4-#5.

Patch #6 moves a mlxsw ip6gre selftest to a shared directory, as ip6gre
is no longer only supported on Spectrum-2 and newer ASICs.

This work is motivated by users that require multiple GRE tunnels that
all share the same underlay VRF. Currently, mlxsw only supports
decapsulation based on the underlay destination IP (i.e., not taking the
GRE key into account), so users need to configure these tunnels with
different source IPs and IPv6 addresses are easier to spare than IPv4.

Tested using existing ip6gre forwarding selftests.
====================

Link: https://lore.kernel.org/r/cover.1670414573.git.petrm@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

ce87a957

selftests: mlxsw: Move IPv6 decap_error test to shared directory · db401875