提交 · 82041634d96e87b41c600a673f10150d9f21f742 · openeuler / Kernel

19 5月, 2021 12 次提交

net/mlx5: SF, Fix show state inactive when its inactivated · 82041634

由 Parav Pandit 提交于 5月 07, 2021

When a SF is inactivated and when it is in a TEARDOWN_REQUEST
state, driver still returns its state as active. This is incorrect.
Fix it by treating TEARDOWN_REQEUST as inactive state. When a SF
is still attached to the driver, on user request to reactivate EINVAL
error is returned. Inform user about it with better code EBUSY and
informative error message.

Fixes: 6a327321 ("net/mlx5: SF, Port function state change support")
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NVu Pham <vuhuong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

82041634

net/mlx5: Fix err prints and return when creating termination table · fca08661

由 Roi Dayan 提交于 5月 13, 2021

Fix print to print correct error code and not using IS_ERR() which
will just result in always printing 1.
Also return real err instead of always -EOPNOTSUPP.

Fixes: 10caabda ("net/mlx5e: Use termination table for VLAN push actions")
Signed-off-by: NRoi Dayan <roid@nvidia.com>
Reviewed-by: NMaor Dickman <maord@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

fca08661

net/mlx5: Set reformat action when needed for termination rules · 442b3d7b

由 Jianbo Liu 提交于 4月 30, 2021

For remote mirroring, after the tunnel packets are received, they are
decapsulated and sent to representor, then re-encapsulated and sent
out over another tunnel. So reformat action is set only when the
destination is required to do encapsulation.

Fixes: 249ccc3c ("net/mlx5e: Add support for offloading traffic from uplink to uplink")
Signed-off-by: NJianbo Liu <jianbol@nvidia.com>
Reviewed-by: NAriel Levkovich <lariel@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

442b3d7b

net/mlx5e: Fix nullptr in add_vlan_push_action() · dca59f4a

由 Dima Chumak 提交于 4月 26, 2021

The result of dev_get_by_index_rcu() is not checked for NULL and then
gets dereferenced immediately.

Also, the RCU lock must be held by the caller of dev_get_by_index_rcu(),
which isn't satisfied by the call stack.

Fix by handling nullptr return value when iflink device is not found.
Add RCU locking around dev_get_by_index_rcu() to avoid possible adverse
effects while iterating over the net_device's hlist.

It is safe not to increment reference count of the net_device pointer in
case of a successful lookup, because it's already handled by VLAN code
during VLAN device registration (see register_vlan_dev and
netdev_upper_dev_link).

Fixes: 278748a9 ("net/mlx5e: Offload TC e-switch rules with egress VLAN device")
Addresses-Coverity: ("Dereference null return value")
Signed-off-by: NDima Chumak <dchumak@nvidia.com>
Reviewed-by: NVlad Buslov <vladbu@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

dca59f4a

{net, RDMA}/mlx5: Fix override of log_max_qp by other device · 3410fbcd

由 Maor Gottlieb 提交于 5月 12, 2021

mlx5_core_dev holds pointer to static profile, hence when the
log_max_qp of the profile is override by some device, then it
effect all other mlx5 devices that share the same profile.
Fix it by having a profile instance for every mlx5 device.

Fixes: 883371c4 ("net/mlx5: Check FW limitations on log_max_qp before setting it")
Signed-off-by: NMaor Gottlieb <maorg@nvidia.com>
Reviewed-by: NMark Bloch <mbloch@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

3410fbcd

Merge branch 'hns3-fixes' · c9fd37a9

由 David S. Miller 提交于 5月 18, 2021

Huazhong Tan says:

====================
net: hns3: fixes for -net

This series includes some bugfixes for the HNS3 ethernet driver.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9fd37a9

net: hns3: check the return of skb_checksum_help() · 9bb5a495

由 Yunsheng Lin 提交于 5月 18, 2021

Currently skb_checksum_help()'s return is ignored, but it may
return error when it fails to allocate memory when linearizing.

So adds checking for the return of skb_checksum_help().

Fixes: 76ad4f0e("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Fixes: 3db084d2("net: hns3: Fix for vxlan tx checksum bug")
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bb5a495

net: hns3: fix user's coalesce configuration lost issue · 73a13d8d

由 Huazhong Tan 提交于 5月 18, 2021

Currently, when adaptive is on, the user's coalesce configuration
may be overwritten by the dynamic one. The reason is that user's
configurations are saved in struct hns3_enet_tqp_vector whose
value maybe changed by the dynamic algorithm. To fix it, use
struct hns3_nic_priv instead of struct hns3_enet_tqp_vector to
save and get the user's configuration.

BTW, operations of storing and restoring coalesce info in the reset
process are unnecessary now, so remove them as well.

Fixes: 434776a5 ("net: hns3: add ethtool_ops.set_coalesce support to PF")
Fixes: 7e96adc4 ("net: hns3: add ethtool_ops.get_coalesce support to PF")
Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73a13d8d

net: hns3: put off calling register_netdev() until client initialize complete · a289a7e5

由 Jian Shen 提交于 5月 18, 2021

Currently, the netdevice is registered before client initializing
complete. So there is a timewindow between netdevice available
and usable. In this case, if user try to change the channel number
or ring param, it may cause the hns3_set_rx_cpu_rmap() being called
twice, and report bug.

[47199.416502] hns3 0000:35:00.0 eth1: set channels: tqp_num=1, rxfh=0
[47199.430340] hns3 0000:35:00.0 eth1: already uninitialized
[47199.438554] hns3 0000:35:00.0: rss changes from 4 to 1
[47199.511854] hns3 0000:35:00.0: Channels changed, rss_size from 4 to 1, tqps from 4 to 1
[47200.163524] ------------[ cut here ]------------
[47200.171674] kernel BUG at lib/cpu_rmap.c:142!
[47200.177847] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[47200.185259] Modules linked in: hclge(+) hns3(-) hns3_cae(O) hns_roce_hw_v2 hnae3 vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O) [last unloaded: hclge]
[47200.205912] CPU: 1 PID: 8260 Comm: ethtool Tainted: G           O      5.11.0-rc3+ #1
[47200.215601] Hardware name:  , xxxxxx 02/04/2021
[47200.223052] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[47200.230188] pc : cpu_rmap_add+0x38/0x40
[47200.237472] lr : irq_cpu_rmap_add+0x84/0x140
[47200.243291] sp : ffff800010e93a30
[47200.247295] x29: ffff800010e93a30 x28: ffff082100584880
[47200.254155] x27: 0000000000000000 x26: 0000000000000000
[47200.260712] x25: 0000000000000000 x24: 0000000000000004
[47200.267241] x23: ffff08209ba03000 x22: ffff08209ba038c0
[47200.273789] x21: 000000000000003f x20: ffff0820e2bc1680
[47200.280400] x19: ffff0820c970ec80 x18: 00000000000000c0
[47200.286944] x17: 0000000000000000 x16: ffffb43debe4a0d0
[47200.293456] x15: fffffc2082990600 x14: dead000000000122
[47200.300059] x13: ffffffffffffffff x12: 000000000000003e
[47200.306606] x11: ffff0820815b8080 x10: ffff53e411988000
[47200.313171] x9 : 0000000000000000 x8 : ffff0820e2bc1700
[47200.319682] x7 : 0000000000000000 x6 : 000000000000003f
[47200.326170] x5 : 0000000000000040 x4 : ffff800010e93a20
[47200.332656] x3 : 0000000000000004 x2 : ffff0820c970ec80
[47200.339168] x1 : ffff0820e2bc1680 x0 : 0000000000000004
[47200.346058] Call trace:
[47200.349324]  cpu_rmap_add+0x38/0x40
[47200.354300]  hns3_set_rx_cpu_rmap+0x6c/0xe0 [hns3]
[47200.362294]  hns3_reset_notify_init_enet+0x1cc/0x340 [hns3]
[47200.370049]  hns3_change_channels+0x40/0xb0 [hns3]
[47200.376770]  hns3_set_channels+0x12c/0x2a0 [hns3]
[47200.383353]  ethtool_set_channels+0x140/0x250
[47200.389772]  dev_ethtool+0x714/0x23d0
[47200.394440]  dev_ioctl+0x4cc/0x640
[47200.399277]  sock_do_ioctl+0x100/0x2a0
[47200.404574]  sock_ioctl+0x28c/0x470
[47200.409079]  __arm64_sys_ioctl+0xb4/0x100
[47200.415217]  el0_svc_common.constprop.0+0x84/0x210
[47200.422088]  do_el0_svc+0x28/0x34
[47200.426387]  el0_svc+0x28/0x70
[47200.431308]  el0_sync_handler+0x1a4/0x1b0
[47200.436477]  el0_sync+0x174/0x180
[47200.441562] Code: 11000405 79000c45 f8247861 d65f03c0 (d4210000)
[47200.448869] ---[ end trace a01efe4ce42e5f34 ]---

The process is like below:
excuting hns3_client_init
|
register_netdev()
|                           hns3_set_channels()
|                           |
hns3_set_rx_cpu_rmap()      hns3_reset_notify_uninit_enet()
|                               |
|                            quit without calling function
|                            hns3_free_rx_cpu_rmap for flag
|                            HNS3_NIC_STATE_INITED is unset.
|                           |
|                           hns3_reset_notify_init_enet()
|                               |
set HNS3_NIC_STATE_INITED    call hns3_set_rx_cpu_rmap()-- crash

Fix it by calling register_netdev() at the end of function
hns3_client_init().

Fixes: 08a10068 ("net: hns3: re-organize vector handle")
Signed-off-by: NJian Shen <shenjian15@huawei.com>
Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a289a7e5

net: hns3: fix incorrect resp_msg issue · a710b9ff

由 Jiaran Zhang 提交于 5月 18, 2021

In hclge_mbx_handler(), if there are two consecutive mailbox
messages that requires resp_msg, the resp_msg is not cleared
after processing the first message, which will cause the resp_msg
data of second message incorrect.

Fix it by clearing the resp_msg before processing every mailbox
message.

Fixes: bb5790b7 ("net: hns3: refactor mailbox response scheme between PF and VF")
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a710b9ff

net: lan78xx: advertise tx software timestamping support · 33e6b167

由 Markus Bloechl 提交于 5月 18, 2021

lan78xx already calls skb_tx_timestamp() in its lan78xx_start_xmit().
Override .get_ts_info to also advertise this capability
(SOF_TIMESTAMPING_TX_SOFTWARE) via ethtool.
Signed-off-by: NMarkus Blöchl <markus.bloechl@ipetronik.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33e6b167

tipc: simplify the finalize work queue · be07f056

由 Xin Long 提交于 5月 18, 2021

This patch is to use "struct work_struct" for the finalize work queue
instead of "struct tipc_net_work", as it can get the "net" and "addr"
from tipc_net's other members and there is no need to add extra net
and addr in tipc_net by defining "struct tipc_net_work".

Note that it's safe to get net from tn->bcl as bcl is always released
after the finalize work queue is done.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NJon Maloy <jmaloy@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be07f056

18 5月, 2021 21 次提交

net: mdiobus: get rid of a BUG_ON() · 1dde47a6

由 Dan Carpenter 提交于 5月 17, 2021

We spotted a bug recently during a review where a driver was
unregistering a bus that wasn't registered, which would trigger this
BUG_ON().  Let's handle that situation more gracefully, and just print
a warning and return.
Reported-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1dde47a6

Merge branch 'gve-fixes' · 37781fd2

由 David S. Miller 提交于 5月 17, 2021

David Awogbemila says:

====================
GVE bug fixes

This patch series includes fixes to some bugs in the gve driver.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37781fd2

gve: Correct SKB queue index validation. · fbd4a28b

由 David Awogbemila 提交于 5月 17, 2021

SKBs with skb_get_queue_mapping(skb) == tx_cfg.num_queues should also be
considered invalid.

Fixes: f5cedc84 ("gve: Add transmit and receive support")
Signed-off-by: NDavid Awogbemila <awogbemila@google.com>
Acked-by: NWillem de Brujin <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fbd4a28b

gve: Upgrade memory barrier in poll routine · f8178183

由 Catherine Sullivan 提交于 5月 17, 2021

As currently written, if the driver checks for more work (via
gve_tx_poll or gve_rx_poll) before the device posts work and the
irq doorbell is not unmasked
(via iowrite32be(GVE_IRQ_ACK | GVE_IRQ_EVENT, ...)) before the device
attempts to raise an interrupt, an interrupt is lost and this could
potentially lead to the traffic being completely halted. For
example, if a tx queue has already been stopped, the driver won't get
the chance to complete work and egress will be halted.

We need a full memory barrier in the poll
routine to ensure that the irq doorbell is unmasked before the driver
checks for more work.

Fixes: f5cedc84 ("gve: Add transmit and receive support")
Signed-off-by: NCatherine Sullivan <csully@google.com>
Signed-off-by: NDavid Awogbemila <awogbemila@google.com>
Acked-by: NWillem de Brujin <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8178183

gve: Add NULL pointer checks when freeing irqs. · 5218e919

由 David Awogbemila 提交于 5月 17, 2021

When freeing notification blocks, we index priv->msix_vectors.
If we failed to allocate priv->msix_vectors (see abort_with_msix_vectors)
this could lead to a NULL pointer dereference if the driver is unloaded.

Fixes: 893ce44d ("gve: Add basic driver framework for Compute Engine Virtual NIC")
Signed-off-by: NDavid Awogbemila <awogbemila@google.com>
Acked-by: NWillem de Brujin <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5218e919

gve: Update mgmt_msix_idx if num_ntfy changes · e96b491a

由 David Awogbemila 提交于 5月 17, 2021

If we do not get the expected number of vectors from
pci_enable_msix_range, we update priv->num_ntfy_blks but not
priv->mgmt_msix_idx. This patch fixes this so that priv->mgmt_msix_idx
is updated accordingly.

Fixes: f5cedc84 ("gve: Add transmit and receive support")
Signed-off-by: NDavid Awogbemila <awogbemila@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e96b491a

gve: Check TX QPL was actually assigned · 5aec55b4

由 Catherine Sullivan 提交于 5月 17, 2021

Correctly check the TX QPL was assigned and unassigned if
other steps in the allocation fail.

Fixes: f5cedc84 (gve: Add transmit and receive support)
Signed-off-by: NCatherine Sullivan <csully@google.com>
Signed-off-by: NDavid Awogbemila <awogbemila@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5aec55b4

netlink: disable IRQs for netlink_lock_table() · 1d482e66

由 Johannes Berg 提交于 5月 17, 2021

Syzbot reports that in mac80211 we have a potential deadlock
between our "local->stop_queue_reasons_lock" (spinlock) and
netlink's nl_table_lock (rwlock). This is because there's at
least one situation in which we might try to send a netlink
message with this spinlock held while it is also possible to
take the spinlock from a hardirq context, resulting in the
following deadlock scenario reported by lockdep:

       CPU0                    CPU1
       ----                    ----
  lock(nl_table_lock);
                               local_irq_disable();
                               lock(&local->queue_stop_reason_lock);
                               lock(nl_table_lock);
  <Interrupt>
    lock(&local->queue_stop_reason_lock);

This seems valid, we can take the queue_stop_reason_lock in
any kind of context ("CPU0"), and call ieee80211_report_ack_skb()
with the spinlock held and IRQs disabled ("CPU1") in some
code path (ieee80211_do_stop() via ieee80211_free_txskb()).

Short of disallowing netlink use in scenarios like these
(which would be rather complex in mac80211's case due to
the deep callchain), it seems the only fix for this is to
disable IRQs while nl_table_lock is held to avoid hitting
this scenario, this disallows the "CPU0" portion of the
reported deadlock.

Note that the writer side (netlink_table_grab()) already
disables IRQs for this lock.

Unfortunately though, this seems like a huge hammer, and
maybe the whole netlink table locking should be reworked.

Reported-by: syzbot+69ff9dff50dcfe14ddd4@syzkaller.appspotmail.com
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d482e66

net/smc: remove device from smcd_dev_list after failed device_add() · 444d7be9

由 Julian Wiedmann 提交于 5月 17, 2021

If the device_add() for a smcd_dev fails, there's no cleanup step that
rolls back the earlier list_add(). The device subsequently gets freed,
and we end up with a corrupted list.

Add some error handling that removes the device from the list.

Fixes: c6ba7c9b ("net/smc: add base infrastructure for SMC-D and ISM")
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

444d7be9

bonding: init notify_work earlier to avoid uninitialized use · 35d96e63

由 Johannes Berg 提交于 5月 17, 2021

If bond_kobj_init() or later kzalloc() in bond_alloc_slave() fail,
then we call kobject_put() on the slave->kobj. This in turn calls
the release function slave_kobj_release() which will always try to
cancel_delayed_work_sync(&slave->notify_work), which shouldn't be
done on an uninitialized work struct.

Always initialize the work struct earlier to avoid problems here.

Syzbot bisected this down to a completely pointless commit, some
fault injection may have been at work here that caused the alloc
failure in the first place, which may interact badly with bisect.

Reported-by: syzbot+bfda097c12a00c8cae67@syzkaller.appspotmail.com
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35d96e63

MAINTAINERS: net: remove stale website link · 3c814519

由 Krzysztof Kozlowski 提交于 5月 17, 2021

The http://www.linuxfoundation.org/en/Net does not contain networking
subsystem description ("Nothing found").
Signed-off-by: NKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c814519

tipc: wait and exit until all work queues are done · 04c26faa

由 Xin Long 提交于 5月 17, 2021

On some host, a crash could be triggered simply by repeating these
commands several times:

  # modprobe tipc
  # tipc bearer enable media udp name UDP1 localip 127.0.0.1
  # rmmod tipc

  [] BUG: unable to handle kernel paging request at ffffffffc096bb00
  [] Workqueue: events 0xffffffffc096bb00
  [] Call Trace:
  []  ? process_one_work+0x1a7/0x360
  []  ? worker_thread+0x30/0x390
  []  ? create_worker+0x1a0/0x1a0
  []  ? kthread+0x116/0x130
  []  ? kthread_flush_work_fn+0x10/0x10
  []  ? ret_from_fork+0x35/0x40

When removing the TIPC module, the UDP tunnel sock will be delayed to
release in a work queue as sock_release() can't be done in rtnl_lock().
If the work queue is schedule to run after the TIPC module is removed,
kernel will crash as the work queue function cleanup_beareri() code no
longer exists when trying to invoke it.

To fix it, this patch introduce a member wq_count in tipc_net to track
the numbers of work queues in schedule, and  wait and exit until all
work queues are done in tipc_exit_net().

Fixes: d0f91938 ("tipc: add ip/udp media type")
Reported-by: NShuang Li <shuali@redhat.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NJon Maloy <jmaloy@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04c26faa

mld: fix panic in mld_newpack() · 020ef930

由 Taehee Yoo 提交于 5月 16, 2021

mld_newpack() doesn't allow to allocate high order page,
only order-0 allocation is allowed.
If headroom size is too large, a kernel panic could occur in skb_put().

Test commands:
    ip netns del A
    ip netns del B
    ip netns add A
    ip netns add B
    ip link add veth0 type veth peer name veth1
    ip link set veth0 netns A
    ip link set veth1 netns B

    ip netns exec A ip link set lo up
    ip netns exec A ip link set veth0 up
    ip netns exec A ip -6 a a 2001:db8:0::1/64 dev veth0
    ip netns exec B ip link set lo up
    ip netns exec B ip link set veth1 up
    ip netns exec B ip -6 a a 2001:db8:0::2/64 dev veth1
    for i in {1..99}
    do
        let A=$i-1
        ip netns exec A ip link add ip6gre$i type ip6gre \
	local 2001:db8:$A::1 remote 2001:db8:$A::2 encaplimit 100
        ip netns exec A ip -6 a a 2001:db8:$i::1/64 dev ip6gre$i
        ip netns exec A ip link set ip6gre$i up

        ip netns exec B ip link add ip6gre$i type ip6gre \
	local 2001:db8:$A::2 remote 2001:db8:$A::1 encaplimit 100
        ip netns exec B ip -6 a a 2001:db8:$i::2/64 dev ip6gre$i
        ip netns exec B ip link set ip6gre$i up
    done

Splat looks like:
kernel BUG at net/core/skbuff.c:110!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.12.0+ #891
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:skb_panic+0x15d/0x15f
Code: 92 fe 4c 8b 4c 24 10 53 8b 4d 70 45 89 e0 48 c7 c7 00 ae 79 83
41 57 41 56 41 55 48 8b 54 24 a6 26 f9 ff <0f> 0b 48 8b 6c 24 20 89
34 24 e8 4a 4e 92 fe 8b 34 24 48 c7 c1 20
RSP: 0018:ffff88810091f820 EFLAGS: 00010282
RAX: 0000000000000089 RBX: ffff8881086e9000 RCX: 0000000000000000
RDX: 0000000000000089 RSI: 0000000000000008 RDI: ffffed1020123efb
RBP: ffff888005f6eac0 R08: ffffed1022fc0031 R09: ffffed1022fc0031
R10: ffff888117e00187 R11: ffffed1022fc0030 R12: 0000000000000028
R13: ffff888008284eb0 R14: 0000000000000ed8 R15: 0000000000000ec0
FS:  0000000000000000(0000) GS:ffff888117c00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8b801c5640 CR3: 0000000033c2c006 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600
 ? ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600
 skb_put.cold.104+0x22/0x22
 ip6_mc_hdr.isra.26.constprop.46+0x12a/0x600
 ? rcu_read_lock_sched_held+0x91/0xc0
 mld_newpack+0x398/0x8f0
 ? ip6_mc_hdr.isra.26.constprop.46+0x600/0x600
 ? lock_contended+0xc40/0xc40
 add_grhead.isra.33+0x280/0x380
 add_grec+0x5ca/0xff0
 ? mld_sendpack+0xf40/0xf40
 ? lock_downgrade+0x690/0x690
 mld_send_initial_cr.part.34+0xb9/0x180
 ipv6_mc_dad_complete+0x15d/0x1b0
 addrconf_dad_completed+0x8d2/0xbb0
 ? lock_downgrade+0x690/0x690
 ? addrconf_rs_timer+0x660/0x660
 ? addrconf_dad_work+0x73c/0x10e0
 addrconf_dad_work+0x73c/0x10e0

Allowing high order page allocation could fix this problem.

Fixes: 72e09ad1 ("ipv6: avoid high order allocations")
Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

020ef930

isdn: mISDN: netjet: Fix crash in nj_probe: · 9f6f8525

由 Zheyu Ma 提交于 5月 16, 2021

'nj_setup' in netjet.c might fail with -EIO and in this case
'card->irq' is initialized and is bigger than zero. A subsequent call to
'nj_release' will free the irq that has not been requested.

Fix this bug by deleting the previous assignment to 'card->irq' and just
keep the assignment before 'request_irq'.

The KASAN's log reveals it:

[    3.354615 ] WARNING: CPU: 0 PID: 1 at kernel/irq/manage.c:1826
free_irq+0x100/0x480
[    3.355112 ] Modules linked in:
[    3.355310 ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.13.0-rc1-00144-g25a12987 #13
[    3.355816 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[    3.356552 ] RIP: 0010:free_irq+0x100/0x480
[    3.356820 ] Code: 6e 08 74 6f 4d 89 f4 e8 5e ac 09 00 4d 8b 74 24 18
4d 85 f6 75 e3 e8 4f ac 09 00 8b 75 c8 48 c7 c7 78 c1 2e 85 e8 e0 cf f5
ff <0f> 0b 48 8b 75 c0 4c 89 ff e8 72 33 0b 03 48 8b 43 40 4c 8b a0 80
[    3.358012 ] RSP: 0000:ffffc90000017b48 EFLAGS: 00010082
[    3.358357 ] RAX: 0000000000000000 RBX: ffff888104dc8000 RCX:
0000000000000000
[    3.358814 ] RDX: ffff8881003c8000 RSI: ffffffff8124a9e6 RDI:
00000000ffffffff
[    3.359272 ] RBP: ffffc90000017b88 R08: 0000000000000000 R09:
0000000000000000
[    3.359732 ] R10: ffffc900000179f0 R11: 0000000000001d04 R12:
0000000000000000
[    3.360195 ] R13: ffff888107dc6000 R14: ffff888107dc6928 R15:
ffff888104dc80a8
[    3.360652 ] FS:  0000000000000000(0000) GS:ffff88817bc00000(0000)
knlGS:0000000000000000
[    3.361170 ] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.361538 ] CR2: 0000000000000000 CR3: 000000000582e000 CR4:
00000000000006f0
[    3.362003 ] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[    3.362175 ] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[    3.362175 ] Call Trace:
[    3.362175 ]  nj_release+0x51/0x1e0
[    3.362175 ]  nj_probe+0x450/0x950
[    3.362175 ]  ? pci_device_remove+0x110/0x110
[    3.362175 ]  local_pci_probe+0x45/0xa0
[    3.362175 ]  pci_device_probe+0x12b/0x1d0
[    3.362175 ]  really_probe+0x2a9/0x610
[    3.362175 ]  driver_probe_device+0x90/0x1d0
[    3.362175 ]  ? mutex_lock_nested+0x1b/0x20
[    3.362175 ]  device_driver_attach+0x68/0x70
[    3.362175 ]  __driver_attach+0x124/0x1b0
[    3.362175 ]  ? device_driver_attach+0x70/0x70
[    3.362175 ]  bus_for_each_dev+0xbb/0x110
[    3.362175 ]  ? rdinit_setup+0x45/0x45
[    3.362175 ]  driver_attach+0x27/0x30
[    3.362175 ]  bus_add_driver+0x1eb/0x2a0
[    3.362175 ]  driver_register+0xa9/0x180
[    3.362175 ]  __pci_register_driver+0x82/0x90
[    3.362175 ]  ? w6692_init+0x38/0x38
[    3.362175 ]  nj_init+0x36/0x38
[    3.362175 ]  do_one_initcall+0x7f/0x3d0
[    3.362175 ]  ? rdinit_setup+0x45/0x45
[    3.362175 ]  ? rcu_read_lock_sched_held+0x4f/0x80
[    3.362175 ]  kernel_init_freeable+0x2aa/0x301
[    3.362175 ]  ? rest_init+0x2c0/0x2c0
[    3.362175 ]  kernel_init+0x18/0x190
[    3.362175 ]  ? rest_init+0x2c0/0x2c0
[    3.362175 ]  ? rest_init+0x2c0/0x2c0
[    3.362175 ]  ret_from_fork+0x1f/0x30
[    3.362175 ] Kernel panic - not syncing: panic_on_warn set ...
[    3.362175 ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.13.0-rc1-00144-g25a12987 #13
[    3.362175 ] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[    3.362175 ] Call Trace:
[    3.362175 ]  dump_stack+0xba/0xf5
[    3.362175 ]  ? free_irq+0x100/0x480
[    3.362175 ]  panic+0x15a/0x3f2
[    3.362175 ]  ? __warn+0xf2/0x150
[    3.362175 ]  ? free_irq+0x100/0x480
[    3.362175 ]  __warn+0x108/0x150
[    3.362175 ]  ? free_irq+0x100/0x480
[    3.362175 ]  report_bug+0x119/0x1c0
[    3.362175 ]  handle_bug+0x3b/0x80
[    3.362175 ]  exc_invalid_op+0x18/0x70
[    3.362175 ]  asm_exc_invalid_op+0x12/0x20
[    3.362175 ] RIP: 0010:free_irq+0x100/0x480
[    3.362175 ] Code: 6e 08 74 6f 4d 89 f4 e8 5e ac 09 00 4d 8b 74 24 18
4d 85 f6 75 e3 e8 4f ac 09 00 8b 75 c8 48 c7 c7 78 c1 2e 85 e8 e0 cf f5
ff <0f> 0b 48 8b 75 c0 4c 89 ff e8 72 33 0b 03 48 8b 43 40 4c 8b a0 80
[    3.362175 ] RSP: 0000:ffffc90000017b48 EFLAGS: 00010082
[    3.362175 ] RAX: 0000000000000000 RBX: ffff888104dc8000 RCX:
0000000000000000
[    3.362175 ] RDX: ffff8881003c8000 RSI: ffffffff8124a9e6 RDI:
00000000ffffffff
[    3.362175 ] RBP: ffffc90000017b88 R08: 0000000000000000 R09:
0000000000000000
[    3.362175 ] R10: ffffc900000179f0 R11: 0000000000001d04 R12:
0000000000000000
[    3.362175 ] R13: ffff888107dc6000 R14: ffff888107dc6928 R15:
ffff888104dc80a8
[    3.362175 ]  ? vprintk+0x76/0x150
[    3.362175 ]  ? free_irq+0x100/0x480
[    3.362175 ]  nj_release+0x51/0x1e0
[    3.362175 ]  nj_probe+0x450/0x950
[    3.362175 ]  ? pci_device_remove+0x110/0x110
[    3.362175 ]  local_pci_probe+0x45/0xa0
[    3.362175 ]  pci_device_probe+0x12b/0x1d0
[    3.362175 ]  really_probe+0x2a9/0x610
[    3.362175 ]  driver_probe_device+0x90/0x1d0
[    3.362175 ]  ? mutex_lock_nested+0x1b/0x20
[    3.362175 ]  device_driver_attach+0x68/0x70
[    3.362175 ]  __driver_attach+0x124/0x1b0
[    3.362175 ]  ? device_driver_attach+0x70/0x70
[    3.362175 ]  bus_for_each_dev+0xbb/0x110
[    3.362175 ]  ? rdinit_setup+0x45/0x45
[    3.362175 ]  driver_attach+0x27/0x30
[    3.362175 ]  bus_add_driver+0x1eb/0x2a0
[    3.362175 ]  driver_register+0xa9/0x180
[    3.362175 ]  __pci_register_driver+0x82/0x90
[    3.362175 ]  ? w6692_init+0x38/0x38
[    3.362175 ]  nj_init+0x36/0x38
[    3.362175 ]  do_one_initcall+0x7f/0x3d0
[    3.362175 ]  ? rdinit_setup+0x45/0x45
[    3.362175 ]  ? rcu_read_lock_sched_held+0x4f/0x80
[    3.362175 ]  kernel_init_freeable+0x2aa/0x301
[    3.362175 ]  ? rest_init+0x2c0/0x2c0
[    3.362175 ]  kernel_init+0x18/0x190
[    3.362175 ]  ? rest_init+0x2c0/0x2c0
[    3.362175 ]  ? rest_init+0x2c0/0x2c0
[    3.362175 ]  ret_from_fork+0x1f/0x30
[    3.362175 ] Dumping ftrace buffer:
[    3.362175 ]    (ftrace buffer empty)
[    3.362175 ] Kernel Offset: disabled
[    3.362175 ] Rebooting in 1 seconds..
Reported-by: NZheyu Ma <zheyuma97@gmail.com>
Signed-off-by: NZheyu Ma <zheyuma97@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f6f8525

Merge branch 'bnxt_en-fixes' · 3aa21e79

由 David S. Miller 提交于 5月 17, 2021

Michael Chan says:

====================
bnxt_en: 2 bug fixes.

The first one fixes a bug to properly identify some recently added HyperV
device IDs.  The second one fixes device context memory set up on systems
with 64K page size.

Please queue these for -stable as well.  Thanks.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3aa21e79

bnxt_en: Fix context memory setup for 64K page size. · 702279d2

由 Michael Chan 提交于 5月 15, 2021

There was a typo in the code that checks for 64K BNXT_PAGE_SHIFT in
bnxt_hwrm_set_pg_attr().  Fix it and make the code more understandable
with a new macro BNXT_SET_CTX_PAGE_ATTR().

Fixes: 1b9394e5 ("bnxt_en: Configure context memory on new devices.")
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

702279d2

bnxt_en: Include new P5 HV definition in VF check. · ab21494b

由 Andy Gospodarek 提交于 5月 15, 2021

Otherwise, some of the recently added HyperV VF IDs would not be
recognized as VF devices and they would not initialize properly.

Fixes: 7fbf359b ("bnxt_en: Add PCI IDs for Hyper-V VF devices.")
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Signed-off-by: NAndy Gospodarek <gospo@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab21494b

net: bnx2: Fix error return code in bnx2_init_board() · 28c66b6d

由 Zhen Lei 提交于 5月 15, 2021

Fix to return -EPERM from the error handling case instead of 0, as done
elsewhere in this function.

Fixes: b6016b76 ("[BNX2]: New Broadcom gigabit network driver.")
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

28c66b6d

NFC: nci: fix memory leak in nci_allocate_device · e0652f8b

由 Dongliang Mu 提交于 5月 15, 2021

nfcmrvl_disconnect fails to free the hci_dev field in struct nci_dev.
Fix this by freeing hci_dev in nci_free_device.

BUG: memory leak
unreferenced object 0xffff888111ea6800 (size 1024):
  comm "kworker/1:0", pid 19, jiffies 4294942308 (age 13.580s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 60 fd 0c 81 88 ff ff  .........`......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000004bc25d43>] kmalloc include/linux/slab.h:552 [inline]
    [<000000004bc25d43>] kzalloc include/linux/slab.h:682 [inline]
    [<000000004bc25d43>] nci_hci_allocate+0x21/0xd0 net/nfc/nci/hci.c:784
    [<00000000c59cff92>] nci_allocate_device net/nfc/nci/core.c:1170 [inline]
    [<00000000c59cff92>] nci_allocate_device+0x10b/0x160 net/nfc/nci/core.c:1132
    [<00000000006e0a8e>] nfcmrvl_nci_register_dev+0x10a/0x1c0 drivers/nfc/nfcmrvl/main.c:153
    [<000000004da1b57e>] nfcmrvl_probe+0x223/0x290 drivers/nfc/nfcmrvl/usb.c:345
    [<00000000d506aed9>] usb_probe_interface+0x177/0x370 drivers/usb/core/driver.c:396
    [<00000000bc632c92>] really_probe+0x159/0x4a0 drivers/base/dd.c:554
    [<00000000f5009125>] driver_probe_device+0x84/0x100 drivers/base/dd.c:740
    [<000000000ce658ca>] __device_attach_driver+0xee/0x110 drivers/base/dd.c:846
    [<000000007067d05f>] bus_for_each_drv+0xb7/0x100 drivers/base/bus.c:431
    [<00000000f8e13372>] __device_attach+0x122/0x250 drivers/base/dd.c:914
    [<000000009cf68860>] bus_probe_device+0xc6/0xe0 drivers/base/bus.c:491
    [<00000000359c965a>] device_add+0x5be/0xc30 drivers/base/core.c:3109
    [<00000000086e4bd3>] usb_set_configuration+0x9d9/0xb90 drivers/usb/core/message.c:2164
    [<00000000ca036872>] usb_generic_driver_probe+0x8c/0xc0 drivers/usb/core/generic.c:238
    [<00000000d40d36f6>] usb_probe_device+0x5c/0x140 drivers/usb/core/driver.c:293
    [<00000000bc632c92>] really_probe+0x159/0x4a0 drivers/base/dd.c:554

Reported-by: syzbot+19bcfc64a8df1318d1c3@syzkaller.appspotmail.com
Fixes: 11f54f22 ("NFC: nci: Add HCI over NCI protocol support")
Signed-off-by: NDongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0652f8b

net: hso: check for allocation failure in hso_create_bulk_serial_device() · 31db0dbd

由 Dan Carpenter 提交于 5月 14, 2021

In current kernels, small allocations never actually fail so this
patch shouldn't affect runtime.

Originally this error handling code written with the idea that if
the "serial->tiocmget" allocation failed, then we would continue
operating instead of bailing out early.  But in later years we added
an unchecked dereference on the next line.

	serial->tiocmget->serial_state_notification = kzalloc();
        ^^^^^^^^^^^^^^^^^^

Since these allocations are never going fail in real life, this is
mostly a philosophical debate, but I think bailing out early is the
correct behavior that the user would want.  And generally it's safer to
bail as soon an error happens.

Fixes: af0de130 ("usb: hso: obey DMA rules in tiocmget")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NJohan Hovold <johan@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31db0dbd

tipc: skb_linearize the head skb when reassembling msgs · b7df21cf

由 Xin Long 提交于 5月 08, 2021

It's not a good idea to append the frag skb to a skb's frag_list if
the frag_list already has skbs from elsewhere, such as this skb was
created by pskb_copy() where the frag_list was cloned (all the skbs
in it were skb_get'ed) and shared by multiple skbs.

However, the new appended frag skb should have been only seen by the
current skb. Otherwise, it will cause use after free crashes as this
appended frag skb are seen by multiple skbs but it only got skb_get
called once.

The same thing happens with a skb updated by pskb_may_pull() with a
skb_cloned skb. Li Shuang has reported quite a few crashes caused
by this when doing testing over macvlan devices:

  [] kernel BUG at net/core/skbuff.c:1970!
  [] Call Trace:
  []  skb_clone+0x4d/0xb0
  []  macvlan_broadcast+0xd8/0x160 [macvlan]
  []  macvlan_process_broadcast+0x148/0x150 [macvlan]
  []  process_one_work+0x1a7/0x360
  []  worker_thread+0x30/0x390

  [] kernel BUG at mm/usercopy.c:102!
  [] Call Trace:
  []  __check_heap_object+0xd3/0x100
  []  __check_object_size+0xff/0x16b
  []  simple_copy_to_iter+0x1c/0x30
  []  __skb_datagram_iter+0x7d/0x310
  []  __skb_datagram_iter+0x2a5/0x310
  []  skb_copy_datagram_iter+0x3b/0x90
  []  tipc_recvmsg+0x14a/0x3a0 [tipc]
  []  ____sys_recvmsg+0x91/0x150
  []  ___sys_recvmsg+0x7b/0xc0

  [] kernel BUG at mm/slub.c:305!
  [] Call Trace:
  []  <IRQ>
  []  kmem_cache_free+0x3ff/0x400
  []  __netif_receive_skb_core+0x12c/0xc40
  []  ? kmem_cache_alloc+0x12e/0x270
  []  netif_receive_skb_internal+0x3d/0xb0
  []  ? get_rx_page_info+0x8e/0xa0 [be2net]
  []  be_poll+0x6ef/0xd00 [be2net]
  []  ? irq_exit+0x4f/0x100
  []  net_rx_action+0x149/0x3b0

  ...

This patch is to fix it by linearizing the head skb if it has frag_list
set in tipc_buf_append(). Note that we choose to do this before calling
skb_unshare(), as __skb_linearize() will avoid skb_copy(). Also, we can
not just drop the frag_list either as the early time.

Fixes: 45c8b7b1 ("tipc: allow non-linear first fragment buffer")
Reported-by: NLi Shuang <shuali@redhat.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NJon Maloy <jmaloy@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7df21cf

15 5月, 2021 7 次提交

net: cdc_eem: fix URL to CDC EEM 1.0 spec · b81ac784

由 Jonathan Davies 提交于 5月 14, 2021

The old URL is no longer accessible.
Signed-off-by: NJonathan Davies <jonathan.davies@nutanix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b81ac784

Merge branch 'lockless-qdisc-packet-stuck' · a0c5393d

由 David S. Miller 提交于 5月 14, 2021

Yunsheng Lin says:

====================
ix packet stuck problem for lockless qdisc

This patchset fixes the packet stuck problem mentioned in [1].

Patch 1: Add STATE_MISSED flag to fix packet stuck problem.
Patch 2: Fix a tx_action rescheduling problem after STATE_MISSED
         flag is added in patch 1.
Patch 3: Fix the significantly higher CPU consumption problem when
         multiple threads are competing on a saturated outgoing
         device.

V8: Change function name as suggested by Jakub and fix some typo
    in patch 3, adjust commit log in patch 2, and add Acked-by
    from Jakub.
V7: Fix netif_tx_wake_queue() data race noted by Jakub.
V6: Some performance optimization in patch 1 suggested by Jakub
    and drop NET_XMIT_DROP checking in patch 3.
V5: add patch 3 to fix the problem reported by Michal Kubecek.
V4: Change STATE_NEED_RESCHEDULE to STATE_MISSED and add patch 2.

[1]. https://lkml.org/lkml/2019/10/9/42
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0c5393d

net: sched: fix tx action reschedule issue with stopped queue · dcad9ee9

由 Yunsheng Lin 提交于 5月 14, 2021

The netdev qeueue might be stopped when byte queue limit has
reached or tx hw ring is full, net_tx_action() may still be
rescheduled if STATE_MISSED is set, which consumes unnecessary
cpu without dequeuing and transmiting any skb because the
netdev queue is stopped, see qdisc_run_end().

This patch fixes it by checking the netdev queue state before
calling qdisc_run() and clearing STATE_MISSED if netdev queue is
stopped during qdisc_run(), the net_tx_action() is rescheduled
again when netdev qeueue is restarted, see netif_tx_wake_queue().

As there is time window between netif_xmit_frozen_or_stopped()
checking and STATE_MISSED clearing, between which STATE_MISSED
may set by net_tx_action() scheduled by netif_tx_wake_queue(),
so set the STATE_MISSED again if netdev queue is restarted.

Fixes: 6b3ba914 ("net: sched: allow qdiscs to handle locking")
Reported-by: NMichal Kubecek <mkubecek@suse.cz>
Acked-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dcad9ee9

net: sched: fix tx action rescheduling issue during deactivation · 102b55ee

由 Yunsheng Lin 提交于 5月 14, 2021

Currently qdisc_run() checks the STATE_DEACTIVATED of lockless
qdisc before calling __qdisc_run(), which ultimately clear the
STATE_MISSED when all the skb is dequeued. If STATE_DEACTIVATED
is set before clearing STATE_MISSED, there may be rescheduling
of net_tx_action() at the end of qdisc_run_end(), see below:

CPU0(net_tx_atcion)  CPU1(__dev_xmit_skb)  CPU2(dev_deactivate)
          .                   .                     .
          .            set STATE_MISSED             .
          .           __netif_schedule()            .
          .                   .           set STATE_DEACTIVATED
          .                   .                qdisc_reset()
          .                   .                     .
          .<---------------   .              synchronize_net()
clear __QDISC_STATE_SCHED  |  .                     .
          .                |  .                     .
          .                |  .            some_qdisc_is_busy()
          .                |  .               return *false*
          .                |  .                     .
  test STATE_DEACTIVATED   |  .                     .
__qdisc_run() *not* called |  .                     .
          .                |  .                     .
   test STATE_MISS         |  .                     .
 __netif_schedule()--------|  .                     .
          .                   .                     .
          .                   .                     .

__qdisc_run() is not called by net_tx_atcion() in CPU0 because
CPU2 has set STATE_DEACTIVATED flag during dev_deactivate(), and
STATE_MISSED is only cleared in __qdisc_run(), __netif_schedule
is called at the end of qdisc_run_end(), causing tx action
rescheduling problem.

qdisc_run() called by net_tx_action() runs in the softirq context,
which should has the same semantic as the qdisc_run() called by
__dev_xmit_skb() protected by rcu_read_lock_bh(). And there is a
synchronize_net() between STATE_DEACTIVATED flag being set and
qdisc_reset()/some_qdisc_is_busy in dev_deactivate(), we can safely
bail out for the deactived lockless qdisc in net_tx_action(), and
qdisc_reset() will reset all skb not dequeued yet.

So add the rcu_read_lock() explicitly to protect the qdisc_run()
and do the STATE_DEACTIVATED checking in net_tx_action() before
calling qdisc_run_begin(). Another option is to do the checking in
the qdisc_run_end(), but it will add unnecessary overhead for
non-tx_action case, because __dev_queue_xmit() will not see qdisc
with STATE_DEACTIVATED after synchronize_net(), the qdisc with
STATE_DEACTIVATED can only be seen by net_tx_action() because of
__netif_schedule().

The STATE_DEACTIVATED checking in qdisc_run() is to avoid race
between net_tx_action() and qdisc_reset(), see:
commit d518d2ed ("net/sched: fix race between deactivation
and dequeue for NOLOCK qdisc"). As the bailout added above for
deactived lockless qdisc in net_tx_action() provides better
protection for the race without calling qdisc_run() at all, so
remove the STATE_DEACTIVATED checking in qdisc_run().

After qdisc_reset(), there is no skb in qdisc to be dequeued, so
clear the STATE_MISSED in dev_reset_queue() too.

Fixes: 6b3ba914 ("net: sched: allow qdiscs to handle locking")
Acked-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
V8: Clearing STATE_MISSED before calling __netif_schedule() has
    avoid the endless rescheduling problem, but there may still
    be a unnecessary rescheduling, so adjust the commit log.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

102b55ee

net: sched: fix packet stuck problem for lockless qdisc · a90c57f2

由 Yunsheng Lin 提交于 5月 14, 2021

Lockless qdisc has below concurrent problem:
    cpu0                 cpu1
     .                     .
q->enqueue                 .
     .                     .
qdisc_run_begin()          .
     .                     .
dequeue_skb()              .
     .                     .
sch_direct_xmit()          .
     .                     .
     .                q->enqueue
     .             qdisc_run_begin()
     .            return and do nothing
     .                     .
qdisc_run_end()            .

cpu1 enqueue a skb without calling __qdisc_run() because cpu0
has not released the lock yet and spin_trylock() return false
for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb
enqueued by cpu1 when calling dequeue_skb() because cpu1 may
enqueue the skb after cpu0 calling dequeue_skb() and before
cpu0 calling qdisc_run_end().

Lockless qdisc has below another concurrent problem when
tx_action is involved:

cpu0(serving tx_action)     cpu1             cpu2
          .                   .                .
          .              q->enqueue            .
          .            qdisc_run_begin()       .
          .              dequeue_skb()         .
          .                   .            q->enqueue
          .                   .                .
          .             sch_direct_xmit()      .
          .                   .         qdisc_run_begin()
          .                   .       return and do nothing
          .                   .                .
 clear __QDISC_STATE_SCHED    .                .
 qdisc_run_begin()            .                .
 return and do nothing        .                .
          .                   .                .
          .            qdisc_run_end()         .

This patch fixes the above data race by:
1. If the first spin_trylock() return false and STATE_MISSED is
   not set, set STATE_MISSED and retry another spin_trylock() in
   case other CPU may not see STATE_MISSED after it releases the
   lock.
2. reschedule if STATE_MISSED is set after the lock is released
   at the end of qdisc_run_end().

For tx_action case, STATE_MISSED is also set when cpu1 is at the
end if qdisc_run_end(), so tx_action will be rescheduled again
to dequeue the skb enqueued by cpu2.

Clear STATE_MISSED before retrying a dequeuing when dequeuing
returns NULL in order to reduce the overhead of the second
spin_trylock() and __netif_schedule() calling.

Also clear the STATE_MISSED before calling __netif_schedule()
at the end of qdisc_run_end() to avoid doing another round of
dequeuing in the pfifo_fast_dequeue().

The performance impact of this patch, tested using pktgen and
dummy netdev with pfifo_fast qdisc attached:

 threads  without+this_patch   with+this_patch      delta
    1        2.61Mpps            2.60Mpps           -0.3%
    2        3.97Mpps            3.82Mpps           -3.7%
    4        5.62Mpps            5.59Mpps           -0.5%
    8        2.78Mpps            2.77Mpps           -0.3%
   16        2.22Mpps            2.22Mpps           -0.0%

Fixes: 6b3ba914 ("net: sched: allow qdiscs to handle locking")
Acked-by: NJakub Kicinski <kuba@kernel.org>
Tested-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a90c57f2

tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAIT · 974271e5

由 Jim Ma 提交于 5月 14, 2021

In tls_sw_splice_read, checkout MSG_* is inappropriate, should use
SPLICE_*, update tls_wait_data to accept nonblock arguments instead
of flags for recvmsg and splice.

Fixes: c46234eb ("tls: RX path for ktls")
Signed-off-by: NJim Ma <majinjing3@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

974271e5

Revert "net:tipc: Fix a double free in tipc_sk_mcast_rcv" · 75016891

由 Hoang Le 提交于 5月 14, 2021

This reverts commit 6bf24dc0.
Above fix is not correct and caused memory leak issue.

Fixes: 6bf24dc0 ("net:tipc: Fix a double free in tipc_sk_mcast_rcv")
Acked-by: NJon Maloy <jmaloy@redhat.com>
Acked-by: NTung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: NHoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

75016891

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功