提交 · eba93de6d31c1734dee59909020a162de612e41e · openeuler / Kernel

17 11月, 2020 1 次提交

bnxt_en: Free port stats during firmware reset. · eba93de6

由 Michael Chan 提交于 11月 15, 2020

Firmware is unable to retain the port counters during any kind of
fatal or non-fatal resets, so we must clear the port counters to
avoid false detection of port counter overflow.

Fixes: fea6b333 ("bnxt_en: Accumulate all counters.")
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Reviewed-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

eba93de6

27 10月, 2020 5 次提交

bnxt_en: Send HWRM_FUNC_RESET fw command unconditionally. · 825741b0

由 Vasundhara Volam 提交于 10月 26, 2020

In the AER or firmware reset flow, if we are in fatal error state or
if pci_channel_offline() is true, we don't send any commands to the
firmware because the commands will likely not reach the firmware and
most commands don't matter much because the firmware is likely to be
reset imminently.

However, the HWRM_FUNC_RESET command is different and we should always
attempt to send it.  In the AER flow for example, the .slot_reset()
call will trigger this fw command and we need to try to send it to
effect the proper reset.

Fixes: b340dc68 ("bnxt_en: Avoid sending firmware messages when AER error is detected.")
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

825741b0

bnxt_en: Check abort error state in bnxt_open_nic(). · a1301f08

由 Michael Chan 提交于 10月 26, 2020

bnxt_open_nic() is called during configuration changes that require
the NIC to be closed and then opened.  This call is protected by
rtnl_lock.  Firmware reset can be happening at the same time.  Only
critical portions of the entire firmware reset sequence are protected
by the rtnl_lock.  It is possible that bnxt_open_nic() can be called
when the firmware reset sequence is aborting.  In that case,
bnxt_open_nic() needs to check if the ABORT_ERR flag is set and
abort if it is.  The configuration change that resulted in the
bnxt_open_nic() call will fail but the NIC will be brought to a
consistent IF_DOWN state.

Without this patch, if bnxt_open_nic() were to continue in this error
state, it may crash like this:

[ 1648.659736] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1648.659768] IP: [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en]
[ 1648.659796] PGD 101e1b3067 PUD 101e1b2067 PMD 0
[ 1648.659813] Oops: 0000 [#1] SMP
[ 1648.659825] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc dell_smbios dell_wmi_descriptor dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper vfat cryptd fat pcspkr ipmi_ssif sg k10temp i2c_piix4 wmi ipmi_si ipmi_devintf ipmi_msghandler tpm_crb acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm libahci megaraid_sas crct10dif_pclmul crct10dif_common
[ 1648.660063]  tg3 libata crc32c_intel bnxt_en(OE) drm_panel_orientation_quirks devlink ptp pps_core dm_mirror dm_region_hash dm_log dm_mod fuse
[ 1648.660105] CPU: 13 PID: 3867 Comm: ethtool Kdump: loaded Tainted: G           OE  ------------   3.10.0-1152.el7.x86_64 #1
[ 1648.660911] Hardware name: Dell Inc. PowerEdge R7515/0R4CNN, BIOS 1.2.14 01/28/2020
[ 1648.661662] task: ffff94e64cbc9080 ti: ffff94f55df1c000 task.ti: ffff94f55df1c000
[ 1648.662409] RIP: 0010:[<ffffffffc01e9b3a>]  [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en]
[ 1648.663171] RSP: 0018:ffff94f55df1fba8  EFLAGS: 00010202
[ 1648.663927] RAX: 0000000000000000 RBX: ffff94e6827e0000 RCX: 0000000000000000
[ 1648.664684] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94e6827e08c0
[ 1648.665433] RBP: ffff94f55df1fc20 R08: 00000000000001ff R09: 0000000000000008
[ 1648.666184] R10: 0000000000000d53 R11: ffff94f55df1f7ce R12: ffff94e6827e08c0
[ 1648.666940] R13: ffff94e6827e08c0 R14: ffff94e6827e08c0 R15: ffffffffb9115e40
[ 1648.667695] FS:  00007f8aadba5740(0000) GS:ffff94f57eb40000(0000) knlGS:0000000000000000
[ 1648.668447] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1648.669202] CR2: 0000000000000000 CR3: 0000001022772000 CR4: 0000000000340fe0
[ 1648.669966] Call Trace:
[ 1648.670730]  [<ffffffffc01f1d5d>] ? bnxt_need_reserve_rings+0x9d/0x170 [bnxt_en]
[ 1648.671496]  [<ffffffffc01fa7ea>] __bnxt_open_nic+0x8a/0x9a0 [bnxt_en]
[ 1648.672263]  [<ffffffffc01f7479>] ? bnxt_close_nic+0x59/0x1b0 [bnxt_en]
[ 1648.673031]  [<ffffffffc01fb11b>] bnxt_open_nic+0x1b/0x50 [bnxt_en]
[ 1648.673793]  [<ffffffffc020037c>] bnxt_set_ringparam+0x6c/0xa0 [bnxt_en]
[ 1648.674550]  [<ffffffffb8a5f564>] dev_ethtool+0x1334/0x21a0
[ 1648.675306]  [<ffffffffb8a719ff>] dev_ioctl+0x1ef/0x5f0
[ 1648.676061]  [<ffffffffb8a324bd>] sock_do_ioctl+0x4d/0x60
[ 1648.676810]  [<ffffffffb8a326bb>] sock_ioctl+0x1eb/0x2d0
[ 1648.677548]  [<ffffffffb8663230>] do_vfs_ioctl+0x3a0/0x5b0
[ 1648.678282]  [<ffffffffb8b8e678>] ? __do_page_fault+0x238/0x500
[ 1648.679016]  [<ffffffffb86634e1>] SyS_ioctl+0xa1/0xc0
[ 1648.679745]  [<ffffffffb8b93f92>] system_call_fastpath+0x25/0x2a
[ 1648.680461] Code: 9e 60 01 00 00 0f 1f 40 00 45 8b 8e 48 01 00 00 31 c9 45 85 c9 0f 8e 73 01 00 00 66 0f 1f 44 00 00 49 8b 86 a8 00 00 00 48 63 d1 <48> 8b 14 d0 48 85 d2 0f 84 46 01 00 00 41 8b 86 44 01 00 00 c7
[ 1648.681986] RIP  [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en]
[ 1648.682724]  RSP <ffff94f55df1fba8>
[ 1648.683451] CR2: 0000000000000000

Fixes: ec5d31e3 ("bnxt_en: Handle firmware reset status during IF_UP.")
Reviewed-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: NPavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

a1301f08

bnxt_en: Re-write PCI BARs after PCI fatal error. · f75d9a0a

由 Vasundhara Volam 提交于 10月 26, 2020

When a PCIe fatal error occurs, the internal latched BAR addresses
in the chip get reset even though the BAR register values in config
space are retained.

pci_restore_state() will not rewrite the BAR addresses if the
BAR address values are valid, causing the chip's internal BAR addresses
to stay invalid.  So we need to zero the BAR registers during PCIe fatal
error to force pci_restore_state() to restore the BAR addresses.  These
write cycles to the BAR registers will cause the proper BAR addresses to
latch internally.

Fixes: 6316ea6d ("bnxt_en: Enable AER support.")
Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

f75d9a0a

bnxt_en: Invoke cancel_delayed_work_sync() for PFs also. · 631ce27a

由 Vasundhara Volam 提交于 10月 26, 2020

As part of the commit b148bb23
("bnxt_en: Fix possible crash in bnxt_fw_reset_task()."),
cancel_delayed_work_sync() is called only for VFs to fix a possible
crash by cancelling any pending delayed work items. It was assumed
by mistake that the flush_workqueue() call on the PF would flush
delayed work items as well.

As flush_workqueue() does not cancel the delayed workqueue, extend
the fix for PFs. This fix will avoid the system crash, if there are
any pending delayed work items in fw_reset_task() during driver's
.remove() call.

Unify the workqueue cleanup logic for both PF and VF by calling
cancel_work_sync() and cancel_delayed_work_sync() directly in
bnxt_remove_one().

Fixes: b148bb23 ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().")
Reviewed-by: NPavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: NAndy Gospodarek <gospo@broadcom.com>
Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

631ce27a

bnxt_en: Fix regression in workqueue cleanup logic in bnxt_remove_one(). · 21d6a11e

由 Vasundhara Volam 提交于 10月 26, 2020

A recent patch has moved the workqueue cleanup logic before
calling unregister_netdev() in bnxt_remove_one(). This caused a
regression because the workqueue can be restarted if the device is
still open. Workqueue cleanup must be done after unregister_netdev().
The workqueue will not restart itself after the device is closed.

Call bnxt_cancel_sp_work() after unregister_netdev() and
call bnxt_dl_fw_reporters_destroy() after that. This fixes the
regession and the original NULL ptr dereference issue.

Fixes: b16939b5 ("bnxt_en: Fix NULL ptr dereference crash in bnxt_fw_reset_task()")
Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

21d6a11e

13 10月, 2020 6 次提交

bnxt_en: Add bnxt_hwrm_nvm_get_dev_info() to query NVM info. · 4933f675

由 Vasundhara Volam 提交于 10月 12, 2020

Add a new bnxt_hwrm_nvm_get_dev_info() to query firmware version
information via NVM_GET_DEV_INFO firmware command. Use it to
get the running version of the NVM configuration information.

This new function will also be used in subsequent patches to get the
stored firmware versions.
Reviewed-by: NAndy Gospodarek <gospo@broadcom.com>
Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1602493854-29283-8-git-send-email-michael.chan@broadcom.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

4933f675

bnxt_en: Log unknown link speed appropriately. · 8eddb3e7

由 Michael Chan 提交于 10月 12, 2020

If the VF virtual link is set to always enabled, the speed may be
unknown when the physical link is down. The driver currently logs
the link speed as 4294967295 Mbps which is SPEED_UNKNOWN. Modify
the link up log message as "speed unknown" which makes more sense.
Reviewed-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1602493854-29283-7-git-send-email-michael.chan@broadcom.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

8eddb3e7

bnxt_en: Log event_data1 and event_data2 when handling RESET_NOTIFY event. · c966c67c

由 Michael Chan 提交于 10月 12, 2020

Log these values that contain useful firmware state information.
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Reviewed-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1602493854-29283-6-git-send-email-michael.chan@broadcom.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

c966c67c

bnxt_en: Simplify bnxt_async_event_process(). · 03ab8ca1

由 Michael Chan 提交于 10月 12, 2020

event_data1 and event_data2 are used when processing most events.
Store these in local variables at the beginning of the function to
simplify many of the case statements.
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1602493854-29283-5-git-send-email-michael.chan@broadcom.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

03ab8ca1

bnxt_en: Set driver default message level. · 8fb35cd3

由 Michael Chan 提交于 10月 12, 2020

Currently, bp->msg_enable has default value of 0. It is more useful
to have the commonly used NETIF_MSG_DRV and NETIF_MSG_HW enabled by
default.

v2: Change the fall back bnxt_reset_task() inside bnxt_rx_ring_reset()
to silent mode. With older fw, we would take the fall back path and
it would be very noisy.
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Reviewed-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1602493854-29283-4-git-send-email-michael.chan@broadcom.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

8fb35cd3

bnxt_en: Return -EROFS to user space, if NVM writes are not permitted. · cf223bfa

由 Vasundhara Volam 提交于 10月 12, 2020

If NVRAM resources are locked, NVM writes are not permitted. In such
scenarios, firmware returns HWRM_ERR_CODE_RESOURCE_LOCKED error to
firmware commands.
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/1602493854-29283-2-git-send-email-michael.chan@broadcom.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

cf223bfa

05 10月, 2020 10 次提交

bnxt_en: Eliminate unnecessary RX resets. · 8d4bd96b