提交 · c83b49383b595be50647f0c764a48c78b5f3c4f8 · openeuler / Kernel

15 5月, 2023 1 次提交

net: nsh: Use correct mac_offset to unwind gso skb in nsh_gso_segment() · c83b4938

由 Dong Chenchen 提交于 5月 11, 2023

As the call trace shows, skb_panic was caused by wrong skb->mac_header
in nsh_gso_segment():

invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 3 PID: 2737 Comm: syz Not tainted 6.3.0-next-20230505 #1
RIP: 0010:skb_panic+0xda/0xe0
call Trace:
 skb_push+0x91/0xa0
 nsh_gso_segment+0x4f3/0x570
 skb_mac_gso_segment+0x19e/0x270
 __skb_gso_segment+0x1e8/0x3c0
 validate_xmit_skb+0x452/0x890
 validate_xmit_skb_list+0x99/0xd0
 sch_direct_xmit+0x294/0x7c0
 __dev_queue_xmit+0x16f0/0x1d70
 packet_xmit+0x185/0x210
 packet_snd+0xc15/0x1170
 packet_sendmsg+0x7b/0xa0
 sock_sendmsg+0x14f/0x160

The root cause is:
nsh_gso_segment() use skb->network_header - nhoff to reset mac_header
in skb_gso_error_unwind() if inner-layer protocol gso fails.
However, skb->network_header may be reset by inner-layer protocol
gso function e.g. mpls_gso_segment. skb->mac_header reset by the
inaccurate network_header will be larger than skb headroom.

nsh_gso_segment
    nhoff = skb->network_header - skb->mac_header;
    __skb_pull(skb,nsh_len)
    skb_mac_gso_segment
        mpls_gso_segment
            skb_reset_network_header(skb);//skb->network_header+=nsh_len
            return -EINVAL;
    skb_gso_error_unwind
        skb_push(skb, nsh_len);
        skb->mac_header = skb->network_header - nhoff;
        // skb->mac_header > skb->headroom, cause skb_push panic

Use correct mac_offset to restore mac_header and get rid of nhoff.

Fixes: c411ed85 ("nsh: add GSO support")
Reported-by: syzbot+632b5d9964208bfef8c0@syzkaller.appspotmail.com
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDong Chenchen <dongchenchen2@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c83b4938

14 5月, 2023 9 次提交

Merge branch 'hns3-fixes' · b41caade

由 David S. Miller 提交于 5月 13, 2023

Hao Lan says:

====================
net: hns3: fix some bug for hns3

There are some bugfixes for the HNS3 ethernet driver. patch#1 fix miss
checking for rx packet. patch#2 fixes VF promisc mode not update
when mac table full bug, and patch#3 fixes a nterrupts not
initialization in VF FLR bug.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b41caade

net: hns3: fix reset timeout when enable full VF · 6b45d5ff

由 Jijie Shao 提交于 5月 12, 2023

The timeout of the cmdq reset command has been increased to
resolve the reset timeout issue in the full VF scenario.
The timeout of other cmdq commands remains unchanged.

Fixes: 8d307f8e ("net: hns3: create new set of unified hclge_comm_cmd_send APIs")
Signed-off-by: NJijie Shao <shaojijie@huawei.com>
Signed-off-by: NHao Lan <lanhao@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b45d5ff

net: hns3: fix reset delay time to avoid configuration timeout · 814d0c78

由 Jie Wang 提交于 5月 12, 2023

Currently the hns3 vf function reset delays 5000ms before vf rebuild
process. In product applications, this delay is too long for application
configurations and causes configuration timeout.

According to the tests, 500ms delay is enough for reset process except PF
FLR. So this patch modifies delay to 500ms in these scenarios.

Fixes: 6988eb2a ("net: hns3: Add support to reset the enet/ring mgmt layer")
Signed-off-by: NJie Wang <wangjie125@huawei.com>
Signed-off-by: NHao Lan <lanhao@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

814d0c78

net: hns3: fix sending pfc frames after reset issue · f14db070

由 Jijie Shao 提交于 5月 12, 2023

To prevent the system from abnormally sending PFC frames after an
abnormal reset. The hns3 driver notifies the firmware to disable pfc
before reset.

Fixes: 35d93a30 ("net: hns3: adjust the process of PF reset")
Signed-off-by: NJijie Shao <shaojijie@huawei.com>
Signed-off-by: NHao Lan <lanhao@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f14db070

net: hns3: fix output information incomplete for dumping tx queue info with debugfs · 89f6bfb0

由 Jie Wang 提交于 5月 12, 2023

In function hns3_dump_tx_queue_info, The print buffer is not enough when
the tx BD number is configured to 32760. As a result several BD
information wouldn't be displayed.

So fix it by increasing the tx queue print buffer length.

Fixes: 630a6738 ("net: hns3: adjust string spaces of some parameters of tx bd info in debugfs")
Signed-off-by: NJie Wang <wangjie125@huawei.com>
Signed-off-by: NHao Lan <lanhao@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89f6bfb0

Merge branch 'dsa-rzn1-a5psw-stp' · 843eb679

由 David S. Miller 提交于 5月 13, 2023

Alexis Lothoré says:

====================
net: dsa: rzn1-a5psw: fix STP states handling

This small series fixes STP support and while adding a new function to
enable/disable learning, use that to disable learning on standalone ports
at switch setup as reported by Vladimir Oltean.

This series was initially submitted on net-next by Clement Leger, but some
career evolutions has made him hand me over those topics.
Also, this new revision is submitted on net instead of net-next for V1
based on Vladimir Oltean's suggestion

Changes since v2:
- fix commit split by moving A5PSW_MGMT_CFG_ENABLE in relevant commit
- fix reverse christmas tree ordering in a5psw_port_stp_state_set

Changes since v1:
- fix typos in commit messages and doc
- re-split STP states handling commit
- add Fixes: tag and new Signed-off-by
- submit series as fix on net instead of net-next
- split learning and blocking setting functions
- remove unused define A5PSW_PORT_ENA_TX_SHIFT
- add boolean for tx/rx enabled for clarity
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

843eb679

net: dsa: rzn1-a5psw: disable learning for standalone ports · ec52b69c

由 Clément Léger 提交于 5月 12, 2023

When ports are in standalone mode, they should have learning disabled to
avoid adding new entries in the MAC lookup table which might be used by
other bridge ports to forward packets. While adding that, also make sure
learning is enabled for CPU port.

Fixes: 888cdb89 ("net: dsa: rzn1-a5psw: add Renesas RZ/N1 advanced 5 port switch driver")
Signed-off-by: NClément Léger <clement.leger@bootlin.com>
Signed-off-by: NAlexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: NPiotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec52b69c

net: dsa: rzn1-a5psw: fix STP states handling · ebe9bc50

由 Alexis Lothoré 提交于 5月 12, 2023

stp_set_state() should actually allow receiving BPDU while in LEARNING
mode which is not the case. Additionally, the BLOCKEN bit does not
actually forbid sending forwarded frames from that port. To fix this, add
a5psw_port_tx_enable() function which allows to disable TX. However, while
its name suggest that TX is totally disabled, it is not and can still
allow to send BPDUs even if disabled. This can be done by using forced
forwarding with the switch tagging mechanism but keeping "filtering"
disabled (which is already the case in the rzn1-a5sw tag driver). With
these fixes, STP support is now functional.

Fixes: 888cdb89 ("net: dsa: rzn1-a5psw: add Renesas RZ/N1 advanced 5 port switch driver")
Signed-off-by: NClément Léger <clement.leger@bootlin.com>
Signed-off-by: NAlexis Lothoré <alexis.lothore@bootlin.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ebe9bc50

net: dsa: rzn1-a5psw: enable management frames for CPU port · 9e4b45f2

由 Clément Léger 提交于 5月 12, 2023

Currently, management frame were discarded before reaching the CPU port due
to a misconfiguration of the MGMT_CONFIG register. Enable them by setting
the correct value in this register in order to correctly receive management
frame and handle STP.

9e4b45f2

13 5月, 2023 1 次提交

erspan: get the proto with the md version for collect_md · d80fc101

由 Xin Long 提交于 5月 11, 2023

In commit 20704bd1 ("erspan: build the header with the right proto
according to erspan_ver"), it gets the proto with t->parms.erspan_ver,
but t->parms.erspan_ver is not used by collect_md branch, and instead
it should get the proto with md->version for collect_md.

Thanks to Kevin for pointing this out.

Fixes: 20704bd1 ("erspan: build the header with the right proto according to erspan_ver")
Fixes: 94d7d8f2 ("ip6_gre: add erspan v2 support")
Reported-by: NKevin Traynor <ktraynor@redhat.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Reviewed-by: NSimon Horman <simon.horman@corigine.com>
Reviewed-by: NWilliam Tu <u9012063@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d80fc101

12 5月, 2023 15 次提交

tcp: fix possible sk_priority leak in tcp_v4_send_reset() · 1e306ec4

由 Eric Dumazet 提交于 5月 11, 2023

When tcp_v4_send_reset() is called with @sk == NULL,
we do not change ctl_sk->sk_priority, which could have been
set from a prior invocation.

Change tcp_v4_send_reset() to set sk_priority and sk_mark
fields before calling ip_send_unicast_reply().

This means tcp_v4_send_reset() and tcp_v4_send_ack()
no longer have to clear ctl_sk->sk_mark after
their call to ip_send_unicast_reply().

Fixes: f6c0f5d2 ("tcp: honor SO_PRIORITY in TIME_WAIT state")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Antoine Tenart <atenart@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e306ec4

vsock: avoid to close connected socket after the timeout · 6d4486ef

由 Zhuang Shengen 提交于 5月 11, 2023

When client and server establish a connection through vsock,
the client send a request to the server to initiate the connection,
then start a timer to wait for the server's response. When the server's
RESPONSE message arrives, the timer also times out and exits. The
server's RESPONSE message is processed first, and the connection is
established. However, the client's timer also times out, the original
processing logic of the client is to directly set the state of this vsock
to CLOSE and return ETIMEDOUT. It will not notify the server when the port
is released, causing the server port remain.
when client's vsock_connect timeout，it should check sk state is
ESTABLISHED or not. if sk state is ESTABLISHED, it means the connection
is established, the client should not set the sk state to CLOSE

Note: I encountered this issue on kernel-4.18, which can be fixed by
this patch. Then I checked the latest code in the community
and found similar issue.

Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
Signed-off-by: NZhuang Shengen <zhuangshengen@huawei.com>
Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d4486ef

sfc: disable RXFCS and RXALL features by default · 134120b0

由 Pieter Jansen van Vuuren 提交于 5月 11, 2023

By default we would not want RXFCS and RXALL features enabled as they are
mainly intended for debugging purposes. This does not stop users from
enabling them later on as needed.

Fixes: 8e57daf7 ("sfc_ef100: RX path for EF100")
Signed-off-by: NPieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com>
Co-developed-by: NEdward Cree <ecree.xilinx@gmail.com>
Signed-off-by: NEdward Cree <ecree.xilinx@gmail.com>
Reviewed-by: NMartin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: NSimon Horman <simon.horman@corigine.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

134120b0

ice: Fix undersized tx_flags variable · 9113302b

由 Jan Sokolowski 提交于 5月 11, 2023

As not all ICE_TX_FLAGS_* fit in current 16-bit limited
tx_flags field that was introduced in the Fixes commit,
VLAN-related information would be discarded completely.
As such, creating a vlan and trying to run ping through
would result in no traffic passing.

Fix that by refactoring tx_flags variable into flags only and
a separate variable that holds VLAN ID. As there is some space left,
type variable can fit between those two. Pahole reports no size
change to ice_tx_buf struct.

Fixes: aa1d3faf ("ice: Robustify cleaning/completing XDP Tx buffers")
Signed-off-by: NJan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: NAlexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9113302b

MAINTAINERS: exclude wireless drivers from netdev · 47af4291

由 Jakub Kicinski 提交于 5月 11, 2023

It seems that we mostly get netdev CCed on wireless patches
which are written by people who don't know any better and
CC everything that get_maintainers spits out. Rather than
patches which indeed could benefit from general networking
review.

Marking them down in patchwork as Awaiting Upstream is
a bit tedious.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Acked-by: NJohannes Berg <johannes@sipsolutions.net>
Acked-by: NKalle Valo <kvalo@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47af4291

nfp: fix NFP_NET_MAX_DSCP definition error · de9c1a23

由 Huayu Chen 提交于 5月 11, 2023

The patch corrects the NFP_NET_MAX_DSCP definition in the main.h file.

The incorrect definition result DSCP bits not being mapped properly when
DCB is set. When NFP_NET_MAX_DSCP was defined as 4, the next 60 DSCP
bits failed to be set.

Fixes: 9b7fe804 ("nfp: add DCB IEEE support")
Cc: stable@vger.kernel.org
Signed-off-by: NHuayu Chen <huayu.chen@corigine.com>
Acked-by: NSimon Horman <simon.horman@corigine.com>
Signed-off-by: NLouis Peens <louis.peens@corigine.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de9c1a23

MAINTAINERS: don't CC docs@ for netlink spec changes · 01e8f6cd

由 Jakub Kicinski 提交于 5月 10, 2023

Documentation/netlink/ contains machine-readable protocol
specs in YAML. Those are much like device tree bindings,
no point CCing docs@ for the changes.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NSimon Horman <simon.horman@corigine.com>

01e8f6cd

MAINTAINERS: sctp: move Neil to CREDITS · d03a2f17

由 Marcelo Ricardo Leitner 提交于 5月 10, 2023

Neil moved away from SCTP related duties.
Move him to CREDITS then and while at it, update SCTP
project website.
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NXin Long <lucien.xin@gmail.com>
Reviewed-by: NSimon Horman <simon.horman@corigine.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d03a2f17

net: phy: dp83867: add w/a for packet errors seen with short cables · 0b01db27

由 Grygorii Strashko 提交于 5月 10, 2023

Introduce the W/A for packet errors seen with short cables (<1m) between
two DP83867 PHYs.

The W/A recommended by DM requires FFE Equalizer Configuration tuning by
writing value 0x0E81 to DSP_FFE_CFG register (0x012C), surrounded by hard
and soft resets as follows:

write_reg(0x001F, 0x8000); //hard reset
write_reg(DSP_FFE_CFG, 0x0E81);
write_reg(0x001F, 0x4000); //soft reset

Since  DP83867 PHY DM says "Changing this register to 0x0E81, will not
affect Long Cable performance.", enable the W/A by default.

Fixes: 2a10154a ("net: phy: dp83867: Add TI dp83867 phy")
Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NSiddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: NSimon Horman <simon.horman@corigine.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b01db27

net: fec: Better handle pm_runtime_get() failing in .remove() · f816b982

由 Uwe Kleine-König 提交于 5月 10, 2023

In the (unlikely) event that pm_runtime_get() (disguised as
pm_runtime_resume_and_get()) fails, the remove callback returned an
error early. The problem with this is that the driver core ignores the
error value and continues removing the device. This results in a
resource leak. Worse the devm allocated resources are freed and so if a
callback of the driver is called later the register mapping is already
gone which probably results in a crash.

Fixes: a31eda65 ("net: fec: fix clock count mis-match")
Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230510200020.1534610-1-u.kleine-koenig@pengutronix.deSigned-off-by: NJakub Kicinski <kuba@kernel.org>

f816b982

ipv6: remove nexthop_fib6_nh_bh() · ef1148d4

由 Eric Dumazet 提交于 5月 10, 2023

After blamed commit, nexthop_fib6_nh_bh() and nexthop_fib6_nh()
are the same.

Delete nexthop_fib6_nh_bh(), and convert /proc/net/ipv6_route
to standard rcu to avoid this splat:

[ 5723.180080] WARNING: suspicious RCU usage
[ 5723.180083] -----------------------------
[ 5723.180084] include/net/nexthop.h:516 suspicious rcu_dereference_check() usage!
[ 5723.180086]
other info that might help us debug this:

[ 5723.180087]
rcu_scheduler_active = 2, debug_locks = 1
[ 5723.180089] 2 locks held by cat/55856:
[ 5723.180091] #0: ffff9440a582afa8 (&p->lock){+.+.}-{3:3}, at: seq_read_iter (fs/seq_file.c:188)
[ 5723.180100] #1: ffffffffaac07040 (rcu_read_lock_bh){....}-{1:2}, at: rcu_lock_acquire (include/linux/rcupdate.h:326)
[ 5723.180109]
stack backtrace:
[ 5723.180111] CPU: 14 PID: 55856 Comm: cat Tainted: G S        I        6.3.0-dbx-DEV #528
[ 5723.180115] Call Trace:
[ 5723.180117]  <TASK>
[ 5723.180119] dump_stack_lvl (lib/dump_stack.c:107)
[ 5723.180124] dump_stack (lib/dump_stack.c:114)
[ 5723.180126] lockdep_rcu_suspicious (include/linux/context_tracking.h:122)
[ 5723.180132] ipv6_route_seq_show (include/net/nexthop.h:?)
[ 5723.180135] ? ipv6_route_seq_next (net/ipv6/ip6_fib.c:2605)
[ 5723.180140] seq_read_iter (fs/seq_file.c:272)
[ 5723.180145] seq_read (fs/seq_file.c:163)
[ 5723.180151] proc_reg_read (fs/proc/inode.c:316 fs/proc/inode.c:328)
[ 5723.180155] vfs_read (fs/read_write.c:468)
[ 5723.180160] ? up_read (kernel/locking/rwsem.c:1617)
[ 5723.180164] ksys_read (fs/read_write.c:613)
[ 5723.180168] __x64_sys_read (fs/read_write.c:621)
[ 5723.180170] do_syscall_64 (arch/x86/entry/common.c:?)
[ 5723.180174] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 5723.180177] RIP: 0033:0x7fa455677d2a

Fixes: 09eed119 ("neighbour: switch to standard rcu, instead of rcu_bh")
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230510154646.370659-1-edumazet@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

ef1148d4

devlink: change per-devlink netdev notifier to static one · e93c9378

由 Jiri Pirko 提交于 5月 10, 2023

The commit 565b4824 ("devlink: change port event netdev notifier
from per-net to global") changed original per-net notifier to be
per-devlink instance. That fixed the issue of non-receiving events
of netdev uninit if that moved to a different namespace.
That worked fine in -net tree.

However, later on when commit ee75f1fc ("net/mlx5e: Create
separate devlink instance for ethernet auxiliary device") and
commit 72ed5d56 ("net/mlx5: Suspend auxiliary devices only in
case of PCI device suspend") were merged, a deadlock was introduced
when removing a namespace with devlink instance with another nested
instance.

Here there is the bad flow example resulting in deadlock with mlx5:
net_cleanup_work -> cleanup_net (takes down_read(&pernet_ops_rwsem) ->
devlink_pernet_pre_exit() -> devlink_reload() ->
mlx5_devlink_reload_down() -> mlx5_unload_one_devl_locked() ->
mlx5_detach_device() -> del_adev() -> mlx5e_remove() ->
mlx5e_destroy_devlink() -> devlink_free() ->
unregister_netdevice_notifier() (takes down_write(&pernet_ops_rwsem)

Steps to reproduce:
$ modprobe mlx5_core
$ ip netns add ns1
$ devlink dev reload pci/0000:08:00.0 netns ns1
$ ip netns del ns1

Resolve this by converting the notifier from per-devlink instance to
a static one registered during init phase and leaving it registered
forever. Use this notifier for all devlink port instances created
later on.

Note what a tree needs this fix only in case all of the cited fixes
commits are present.
Reported-by: NMoshe Shemesh <moshe@nvidia.com>
Fixes: 565b4824 ("devlink: change port event netdev notifier from per-net to global")
Fixes: ee75f1fc ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device")
Fixes: 72ed5d56 ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend")
Signed-off-by: NJiri Pirko <jiri@nvidia.com>
Reviewed-by: NSimon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230510144621.932017-1-jiri@resnulli.usSigned-off-by: NJakub Kicinski <kuba@kernel.org>

e93c9378

Merge branch 'selftests-seg6-make-srv6_end_dt4_l3vpn_test-more-robust' · 7ce93d6f

由 Jakub Kicinski 提交于 5月 11, 2023

Andrea Mayer says:

====================
selftests: seg6: make srv6_end_dt4_l3vpn_test more robust

This pachset aims to improve and make more robust the selftests performed to
check whether SRv6 End.DT4 beahvior works as expected under different system
configurations.
Some Linux distributions enable Deduplication Address Detection and Reverse
Path Filtering mechanisms by default which can interfere with SRv6 End.DT4
behavior and cause selftests to fail.

The following patches improve selftests for End.DT4 by taking these two
mechanisms into account. Specifically:
 - patch 1/2: selftests: seg6: disable DAD on IPv6 router cfg for
              srv6_end_dt4_l3vpn_test
 - patch 2/2: selftets: seg6: disable rp_filter by default in
              srv6_end_dt4_l3vpn_test
====================

Link: https://lore.kernel.org/r/20230510111638.12408-1-andrea.mayer@uniroma2.itSigned-off-by: NJakub Kicinski <kuba@kernel.org>

7ce93d6f

selftets: seg6: disable rp_filter by default in srv6_end_dt4_l3vpn_test · f97b8401

由 Andrea Mayer 提交于 5月 10, 2023

On some distributions, the rp_filter is automatically set (=1) by
default on a netdev basis (also on VRFs).
In an SRv6 End.DT4 behavior, decapsulated IPv4 packets are routed using
the table associated with the VRF bound to that tunnel. During lookup
operations, the rp_filter can lead to packet loss when activated on the
VRF.
Therefore, we chose to make this selftest more robust by explicitly
disabling the rp_filter during tests (as it is automatically set by some
Linux distributions).

Fixes: 2195444e ("selftests: add selftest for the SRv6 End.DT4 behavior")
Reported-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
Tested-by: NHangbin Liu <liuhangbin@gmail.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

f97b8401

selftests: seg6: disable DAD on IPv6 router cfg for srv6_end_dt4_l3vpn_test · 21a933c7

由 Andrea Mayer 提交于 5月 10, 2023

The srv6_end_dt4_l3vpn_test instantiates a virtual network consisting of
several routers (rt-1, rt-2) and hosts.
When the IPv6 addresses of rt-{1,2} routers are configured, the Deduplicate
Address Detection (DAD) kicks in when enabled in the Linux distros running
the selftests. DAD is used to check whether an IPv6 address is already
assigned in a network. Such a mechanism consists of sending an ICMPv6 Echo
Request and waiting for a reply.
As the DAD process could take too long to complete, it may cause the
failing of some tests carried out by the srv6_end_dt4_l3vpn_test script.

To make the srv6_end_dt4_l3vpn_test more robust, we disable DAD on routers
since we configure the virtual network manually and do not need any address
deduplication mechanism at all.

Fixes: 2195444e ("selftests: add selftest for the SRv6 End.DT4 behavior")
Signed-off-by: NAndrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

21a933c7

11 5月, 2023 10 次提交

Merge tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 6e27831b

由 Linus Torvalds 提交于 5月 11, 2023

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Current release - regressions:

   - mtk_eth_soc: fix NULL pointer dereference

  Previous releases - regressions:

   - core:
      - skb_partial_csum_set() fix against transport header magic value
      - fix load-tearing on sk->sk_stamp in sock_recv_cmsgs().
      - annotate sk->sk_err write from do_recvmmsg()
      - add vlan_get_protocol_and_depth() helper

   - netlink: annotate accesses to nlk->cb_running

   - netfilter: always release netdev hooks from notifier

  Previous releases - always broken:

   - core: deal with most data-races in sk_wait_event()

   - netfilter: fix possible bug_on with enable_hooks=1

   - eth: bonding: fix send_peer_notif overflow

   - eth: xpcs: fix incorrect number of interfaces

   - eth: ipvlan: fix out-of-bounds caused by unclear skb->cb

   - eth: stmmac: Initialize MAC_ONEUS_TIC_COUNTER register"

* tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
  af_unix: Fix data races around sk->sk_shutdown.
  af_unix: Fix a data race of sk->sk_receive_queue->qlen.
  net: datagram: fix data-races in datagram_poll()
  net: mscc: ocelot: fix stat counter register values
  ipvlan:Fix out-of-bounds caused by unclear skb->cb
  docs: networking: fix x25-iface.rst heading & index order
  gve: Remove the code of clearing PBA bit
  tcp: add annotations around sk->sk_shutdown accesses
  net: add vlan_get_protocol_and_depth() helper
  net: pcs: xpcs: fix incorrect number of interfaces
  net: deal with most data-races in sk_wait_event()
  net: annotate sk->sk_err write from do_recvmmsg()
  netlink: annotate accesses to nlk->cb_running
  kselftest: bonding: add num_grat_arp test
  selftests: forwarding: lib: add netns support for tc rule handle stats get
  Documentation: bonding: fix the doc of peer_notif_delay
  bonding: fix send_peer_notif overflow
  net: ethernet: mtk_eth_soc: fix NULL pointer dereference
  selftests: nft_flowtable.sh: check ingress/egress chain too
  selftests: nft_flowtable.sh: monitor result file sizes
  ...

6e27831b

Merge tag 'media/v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 691e1eee

由 Linus Torvalds 提交于 5月 11, 2023

Pull media fixes from Mauro Carvalho Chehab:

 - fix some unused-variable warning in mtk-mdp3

 - ignore unused suspend operations in nxp

 - some driver fixes in rcar-vin

* tag 'media/v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: platform: mtk-mdp3: work around unused-variable warning
  media: nxp: ignore unused suspend operations
  media: rcar-vin: Select correct interrupt mode for V4L2_FIELD_ALTERNATE
  media: rcar-vin: Fix NV12 size alignment
  media: rcar-vin: Gen3 can not scale NV12

691e1eee

Merge tag 'nf-23-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · cceac926

由 Jakub Kicinski 提交于 5月 10, 2023

Pablo Neira Ayuso says:

====================
Netfilter updates for net

The following patchset contains Netfilter fixes for net:

1) Fix UAF when releasing netnamespace, from Florian Westphal.

2) Fix possible BUG_ON when nf_conntrack is enabled with enable_hooks,
   from Florian Westphal.

3) Fixes for nft_flowtable.sh selftest, from Boris Sukholitko.

4) Extend nft_flowtable.sh selftest to cover integration with
   ingress/egress hooks, from Florian Westphal.

* tag 'nf-23-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  selftests: nft_flowtable.sh: check ingress/egress chain too
  selftests: nft_flowtable.sh: monitor result file sizes
  selftests: nft_flowtable.sh: wait for specific nc pids
  selftests: nft_flowtable.sh: no need for ps -x option
  selftests: nft_flowtable.sh: use /proc for pid checking
  netfilter: conntrack: fix possible bug_on with enable_hooks=1
  netfilter: nf_tables: always release netdev hooks from notifier
====================

Link: https://lore.kernel.org/r/20230510083313.152961-1-pablo@netfilter.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

cceac926

Merge branch 'af_unix-fix-two-data-races-reported-by-kcsan' · 33dcee99

由 Jakub Kicinski 提交于 5月 10, 2023

Kuniyuki Iwashima says:

====================
af_unix: Fix two data races reported by KCSAN.

KCSAN reported data races around these two fields for AF_UNIX sockets.

  * sk->sk_receive_queue->qlen
  * sk->sk_shutdown

Let's annotate them properly.
====================

Link: https://lore.kernel.org/r/20230510003456.42357-1-kuniyu@amazon.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

33dcee99

af_unix: Fix data races around sk->sk_shutdown. · e1d09c2c

由 Kuniyuki Iwashima 提交于 5月 09, 2023

KCSAN found a data race around sk->sk_shutdown where unix_release_sock()
and unix_shutdown() update it under unix_state_lock(), OTOH unix_poll()
and unix_dgram_poll() read it locklessly.

We need to annotate the writes and reads with WRITE_ONCE() and READ_ONCE().

BUG: KCSAN: data-race in unix_poll / unix_release_sock

write to 0xffff88800d0f8aec of 1 bytes by task 264 on cpu 0:
 unix_release_sock+0x75c/0x910 net/unix/af_unix.c:631
 unix_release+0x59/0x80 net/unix/af_unix.c:1042
 __sock_release+0x7d/0x170 net/socket.c:653
 sock_close+0x19/0x30 net/socket.c:1397
 __fput+0x179/0x5e0 fs/file_table.c:321
 ____fput+0x15/0x20 fs/file_table.c:349
 task_work_run+0x116/0x1a0 kernel/task_work.c:179
 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204
 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
 syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297
 do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

read to 0xffff88800d0f8aec of 1 bytes by task 222 on cpu 1:
 unix_poll+0xa3/0x2a0 net/unix/af_unix.c:3170
 sock_poll+0xcf/0x2b0 net/socket.c:1385
 vfs_poll include/linux/poll.h:88 [inline]
 ep_item_poll.isra.0+0x78/0xc0 fs/eventpoll.c:855
 ep_send_events fs/eventpoll.c:1694 [inline]
 ep_poll fs/eventpoll.c:1823 [inline]
 do_epoll_wait+0x6c4/0xea0 fs/eventpoll.c:2258
 __do_sys_epoll_wait fs/eventpoll.c:2270 [inline]
 __se_sys_epoll_wait fs/eventpoll.c:2265 [inline]
 __x64_sys_epoll_wait+0xcc/0x190 fs/eventpoll.c:2265
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

value changed: 0x00 -> 0x03

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 222 Comm: dbus-broker Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014

Fixes: 3c73419c ("af_unix: fix 'poll for write'/ connected DGRAM sockets")
Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NMichal Kubiak <michal.kubiak@intel.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

e1d09c2c

af_unix: Fix a data race of sk->sk_receive_queue->qlen. · 679ed006

由 Kuniyuki Iwashima 提交于 5月 09, 2023

KCSAN found a data race of sk->sk_receive_queue->qlen where recvmsg()
updates qlen under the queue lock and sendmsg() checks qlen under
unix_state_sock(), not the queue lock, so the reader side needs
READ_ONCE().

BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_wait_for_peer

write (marked) to 0xffff888019fe7c68 of 4 bytes by task 49792 on cpu 0:
 __skb_unlink include/linux/skbuff.h:2347 [inline]
 __skb_try_recv_from_queue+0x3de/0x470 net/core/datagram.c:197
 __skb_try_recv_datagram+0xf7/0x390 net/core/datagram.c:263
 __unix_dgram_recvmsg+0x109/0x8a0 net/unix/af_unix.c:2452
 unix_dgram_recvmsg+0x94/0xa0 net/unix/af_unix.c:2549
 sock_recvmsg_nosec net/socket.c:1019 [inline]
 ____sys_recvmsg+0x3a3/0x3b0 net/socket.c:2720
 ___sys_recvmsg+0xc8/0x150 net/socket.c:2764
 do_recvmmsg+0x182/0x560 net/socket.c:2858
 __sys_recvmmsg net/socket.c:2937 [inline]
 __do_sys_recvmmsg net/socket.c:2960 [inline]
 __se_sys_recvmmsg net/socket.c:2953 [inline]
 __x64_sys_recvmmsg+0x153/0x170 net/socket.c:2953
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

read to 0xffff888019fe7c68 of 4 bytes by task 49793 on cpu 1:
 skb_queue_len include/linux/skbuff.h:2127 [inline]
 unix_recvq_full net/unix/af_unix.c:229 [inline]
 unix_wait_for_peer+0x154/0x1a0 net/unix/af_unix.c:1445
 unix_dgram_sendmsg+0x13bc/0x14b0 net/unix/af_unix.c:2048
 sock_sendmsg_nosec net/socket.c:724 [inline]
 sock_sendmsg+0x148/0x160 net/socket.c:747
 ____sys_sendmsg+0x20e/0x620 net/socket.c:2503
 ___sys_sendmsg+0xc6/0x140 net/socket.c:2557
 __sys_sendmmsg+0x11d/0x370 net/socket.c:2643
 __do_sys_sendmmsg net/socket.c:2672 [inline]
 __se_sys_sendmmsg net/socket.c:2669 [inline]
 __x64_sys_sendmmsg+0x58/0x70 net/socket.c:2669
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

value changed: 0x0000000b -> 0x00000001

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 49793 Comm: syz-executor.0 Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NMichal Kubiak <michal.kubiak@intel.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

679ed006

net: datagram: fix data-races in datagram_poll() · 5bca1d08

由 Eric Dumazet 提交于 5月 09, 2023

datagram_poll() runs locklessly, we should add READ_ONCE()
annotations while reading sk->sk_err, sk->sk_shutdown and sk->sk_state.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230509173131.3263780-1-edumazet@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

5bca1d08

MAINTAINERS: re-sort all entries and fields · 80e62bc8

由 Linus Torvalds 提交于 5月 10, 2023

It's been a few years since we've sorted this thing, and the end result
is that we've added MAINTAINERS entries in the wrong order, and a number
of entries have their fields in non-canonical order too.

So roll this boulder up the hill one more time by re-running

   ./scripts/parse-maintainers.pl --order

on it.

This file ends up being fairly painful for merge conflicts even
normally, since unlike almost all other kernel files it's one of those
"everybody touches the same thing", and re-ordering all entries is only
going to make that worse.  But the alternative is to never do it at all,
and just let it all rot..

The rc2 week is likely the quietest and least painful time to do this.
Requested-by: NRandy Dunlap <rdunlap@infradead.org>
Requested-by: Joe Perches <joe@perches.com>	# "Please use --order"
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

80e62bc8

Merge tag 'fsnotify_for_v6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · d295b66a

由 Linus Torvalds 提交于 5月 10, 2023

Pull inotify fix from Jan Kara:
 "A fix for possibly reporting invalid watch descriptor with inotify
  event"

* tag 'fsnotify_for_v6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  inotify: Avoid reporting event with invalid wd

d295b66a

Merge tag 'gfs2-v6.3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 2a78769d

由 Linus Torvalds 提交于 5月 10, 2023

Pull gfs2 fix from Andreas Gruenbacher:

 - Fix a NULL pointer dereference when mounting corrupted filesystems

* tag 'gfs2-v6.3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Don't deref jdesc in evict

2a78769d

10 5月, 2023 4 次提交

gfs2: Don't deref jdesc in evict · 504a10d9

由 Bob Peterson 提交于 4月 28, 2023

On corrupt gfs2 file systems the evict code can try to reference the
journal descriptor structure, jdesc, after it has been freed and set to
NULL. The sequence of events is:

init_journal()
...
fail_jindex:
   gfs2_jindex_free(sdp); <------frees journals, sets jdesc = NULL
      if (gfs2_holder_initialized(&ji_gh))
         gfs2_glock_dq_uninit(&ji_gh);
fail:
   iput(sdp->sd_jindex); <--references jdesc in evict_linked_inode
      evict()
         gfs2_evict_inode()
            evict_linked_inode()
               ret = gfs2_trans_begin(sdp, 0, sdp->sd_jdesc->jd_blocks);
<------references the now freed/zeroed sd_jdesc pointer.

The call to gfs2_trans_begin is done because the truncate_inode_pages
call can cause gfs2 events that require a transaction, such as removing
journaled data (jdata) blocks from the journal.

This patch fixes the problem by adding a check for sdp->sd_jdesc to
function gfs2_evict_inode. In theory, this should only happen to corrupt
gfs2 file systems, when gfs2 detects the problem, reports it, then tries
to evict all the system inodes it has read in up to that point.
Reported-by: NYang Lan <lanyang0908@gmail.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

504a10d9

Merge tag 'platform-drivers-x86-v6.4-2' of... · ad2fd53a

由 Linus Torvalds 提交于 5月 10, 2023

Merge tag 'platform-drivers-x86-v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:
 "Nothing special to report just various small fixes:

   - thinkpad_acpi: Fix profile (performance/bal/low-power) regression
     on T490

   - misc other small fixes / hw-id additions"

* tag 'platform-drivers-x86-v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/mellanox: fix potential race in mlxbf-tmfifo driver
  platform/x86: touchscreen_dmi: Add info for the Dexp Ursus KX210i
  platform/x86: touchscreen_dmi: Add upside-down quirk for GDIX1002 ts on the Juno Tablet
  platform/x86: thinkpad_acpi: Add profile force ability
  platform/x86: thinkpad_acpi: Fix platform profiles on T490
  platform/x86: hp-wmi: add micmute to hp_wmi_keymap struct
  platform/x86/intel-uncore-freq: Return error on write frequency
  platform/x86: intel_scu_pcidrv: Add back PCI ID for Medfield

ad2fd53a

net: mscc: ocelot: fix stat counter register values · cdc2e28e

由 Colin Foster 提交于 5月 09, 2023

Commit d4c36765 ("net: mscc: ocelot: keep ocelot_stat_layout by reg
address, not offset") organized the stats counters for Ocelot chips, namely
the VSC7512 and VSC7514. A few of the counter offsets were incorrect, and
were caught by this warning:

WARNING: CPU: 0 PID: 24 at drivers/net/ethernet/mscc/ocelot_stats.c:909
ocelot_stats_init+0x1fc/0x2d8
reg 0x5000078 had address 0x220 but reg 0x5000079 has address 0x214,
bulking broken!

Fix these register offsets.

Fixes: d4c36765 ("net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset")
Signed-off-by: NColin Foster <colin.foster@in-advantage.com>
Reviewed-by: NSimon Horman <simon.horman@corigine.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cdc2e28e

ipvlan:Fix out-of-bounds caused by unclear skb->cb · 90cbed52

由 t.feng 提交于 5月 10, 2023

If skb enqueue the qdisc, fq_skb_cb(skb)->time_to_send is changed which
is actually skb->cb, and IPCB(skb_in)->opt will be used in
__ip_options_echo. It is possible that memcpy is out of bounds and lead
to stack overflow.
We should clear skb->cb before ip_local_out or ip6_local_out.

v2:
1. clean the stack info
2. use IPCB/IP6CB instead of skb->cb

crash on stable-5.10(reproduce in kasan kernel).
Stack info:
[ 2203.651571] BUG: KASAN: stack-out-of-bounds in
__ip_options_echo+0x589/0x800
[ 2203.653327] Write of size 4 at addr ffff88811a388f27 by task
swapper/3/0
[ 2203.655460] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted
5.10.0-60.18.0.50.h856.kasan.eulerosv2r11.x86_64 #1
[ 2203.655466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.10.2-0-g5f4c7b1-20181220_000000-szxrtosci10000 04/01/2014
[ 2203.655475] Call Trace:
[ 2203.655481]  <IRQ>
[ 2203.655501]  dump_stack+0x9c/0xd3
[ 2203.655514]  print_address_description.constprop.0+0x19/0x170
[ 2203.655530]  __kasan_report.cold+0x6c/0x84
[ 2203.655586]  kasan_report+0x3a/0x50
[ 2203.655594]  check_memory_region+0xfd/0x1f0
[ 2203.655601]  memcpy+0x39/0x60
[ 2203.655608]  __ip_options_echo+0x589/0x800
[ 2203.655654]  __icmp_send+0x59a/0x960
[ 2203.655755]  nf_send_unreach+0x129/0x3d0 [nf_reject_ipv4]
[ 2203.655763]  reject_tg+0x77/0x1bf [ipt_REJECT]
[ 2203.655772]  ipt_do_table+0x691/0xa40 [ip_tables]
[ 2203.655821]  nf_hook_slow+0x69/0x100
[ 2203.655828]  __ip_local_out+0x21e/0x2b0
[ 2203.655857]  ip_local_out+0x28/0x90
[ 2203.655868]  ipvlan_process_v4_outbound+0x21e/0x260 [ipvlan]
[ 2203.655931]  ipvlan_xmit_mode_l3+0x3bd/0x400 [ipvlan]
[ 2203.655967]  ipvlan_queue_xmit+0xb3/0x190 [ipvlan]
[ 2203.655977]  ipvlan_start_xmit+0x2e/0xb0 [ipvlan]
[ 2203.655984]  xmit_one.constprop.0+0xe1/0x280
[ 2203.655992]  dev_hard_start_xmit+0x62/0x100
[ 2203.656000]  sch_direct_xmit+0x215/0x640
[ 2203.656028]  __qdisc_run+0x153/0x1f0
[ 2203.656069]  __dev_queue_xmit+0x77f/0x1030
[ 2203.656173]  ip_finish_output2+0x59b/0xc20
[ 2203.656244]  __ip_finish_output.part.0+0x318/0x3d0
[ 2203.656312]  ip_finish_output+0x168/0x190
[ 2203.656320]  ip_output+0x12d/0x220
[ 2203.656357]  __ip_queue_xmit+0x392/0x880
[ 2203.656380]  __tcp_transmit_skb+0x1088/0x11c0
[ 2203.656436]  __tcp_retransmit_skb+0x475/0xa30
[ 2203.656505]  tcp_retransmit_skb+0x2d/0x190
[ 2203.656512]  tcp_retransmit_timer+0x3af/0x9a0
[ 2203.656519]  tcp_write_timer_handler+0x3ba/0x510
[ 2203.656529]  tcp_write_timer+0x55/0x180
[ 2203.656542]  call_timer_fn+0x3f/0x1d0
[ 2203.656555]  expire_timers+0x160/0x200
[ 2203.656562]  run_timer_softirq+0x1f4/0x480
[ 2203.656606]  __do_softirq+0xfd/0x402
[ 2203.656613]  asm_call_irq_on_stack+0x12/0x20
[ 2203.656617]  </IRQ>
[ 2203.656623]  do_softirq_own_stack+0x37/0x50
[ 2203.656631]  irq_exit_rcu+0x134/0x1a0
[ 2203.656639]  sysvec_apic_timer_interrupt+0x36/0x80
[ 2203.656646]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 2203.656654] RIP: 0010:default_idle+0x13/0x20
[ 2203.656663] Code: 89 f0 5d 41 5c 41 5d 41 5e c3 cc cc cc cc cc cc cc
cc cc cc cc cc cc 0f 1f 44 00 00 0f 1f 44 00 00 0f 00 2d 9f 32 57 00 fb
f4 <c3> cc cc cc cc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 be 08
[ 2203.656668] RSP: 0018:ffff88810036fe78 EFLAGS: 00000256
[ 2203.656676] RAX: ffffffffaf2a87f0 RBX: ffff888100360000 RCX:
ffffffffaf290191
[ 2203.656681] RDX: 0000000000098b5e RSI: 0000000000000004 RDI:
ffff88811a3c4f60
[ 2203.656686] RBP: 0000000000000000 R08: 0000000000000001 R09:
ffff88811a3c4f63
[ 2203.656690] R10: ffffed10234789ec R11: 0000000000000001 R12:
0000000000000003
[ 2203.656695] R13: ffff888100360000 R14: 0000000000000000 R15:
0000000000000000
[ 2203.656729]  default_idle_call+0x5a/0x150
[ 2203.656735]  cpuidle_idle_call+0x1c6/0x220
[ 2203.656780]  do_idle+0xab/0x100
[ 2203.656786]  cpu_startup_entry+0x19/0x20
[ 2203.656793]  secondary_startup_64_no_verify+0xc2/0xcb

[ 2203.657409] The buggy address belongs to the page:
[ 2203.658648] page:0000000027a9842f refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0x11a388
[ 2203.658665] flags:
0x17ffffc0001000(reserved|node=0|zone=2|lastcpupid=0x1fffff)
[ 2203.658675] raw: 0017ffffc0001000 ffffea000468e208 ffffea000468e208
0000000000000000
[ 2203.658682] raw: 0000000000000000 0000000000000000 00000001ffffffff
0000000000000000
[ 2203.658686] page dumped because: kasan: bad access detected

To reproduce(ipvlan with IPVLAN_MODE_L3):
Env setting:
=======================================================
modprobe ipvlan ipvlan_default_mode=1
sysctl net.ipv4.conf.eth0.forwarding=1
iptables -t nat -A POSTROUTING -s 20.0.0.0/255.255.255.0 -o eth0 -j
MASQUERADE
ip link add gw link eth0 type ipvlan
ip -4 addr add 20.0.0.254/24 dev gw
ip netns add net1
ip link add ipv1 link eth0 type ipvlan
ip link set ipv1 netns net1
ip netns exec net1 ip link set ipv1 up
ip netns exec net1 ip -4 addr add 20.0.0.4/24 dev ipv1
ip netns exec net1 route add default gw 20.0.0.254
ip netns exec net1 tc qdisc add dev ipv1 root netem loss 10%
ifconfig gw up
iptables -t filter -A OUTPUT -p tcp --dport 8888 -j REJECT --reject-with
icmp-port-unreachable
=======================================================
And then excute the shell(curl any address of eth0 can reach):

for((i=1;i<=100000;i++))
do
        ip netns exec net1 curl x.x.x.x:8888
done
=======================================================

Fixes: 2ad7bf36 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: N"t.feng" <fengtao40@huawei.com>
Suggested-by: NFlorian Westphal <fw@strlen.de>
Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90cbed52

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功