- 17 11月, 2020 1 次提交
-
-
由 Michael Chan 提交于
Firmware is unable to retain the port counters during any kind of fatal or non-fatal resets, so we must clear the port counters to avoid false detection of port counter overflow. Fixes: fea6b333 ("bnxt_en: Accumulate all counters.") Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com> Reviewed-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 27 10月, 2020 5 次提交
-
-
由 Vasundhara Volam 提交于
In the AER or firmware reset flow, if we are in fatal error state or if pci_channel_offline() is true, we don't send any commands to the firmware because the commands will likely not reach the firmware and most commands don't matter much because the firmware is likely to be reset imminently. However, the HWRM_FUNC_RESET command is different and we should always attempt to send it. In the AER flow for example, the .slot_reset() call will trigger this fw command and we need to try to send it to effect the proper reset. Fixes: b340dc68 ("bnxt_en: Avoid sending firmware messages when AER error is detected.") Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Michael Chan 提交于
bnxt_open_nic() is called during configuration changes that require the NIC to be closed and then opened. This call is protected by rtnl_lock. Firmware reset can be happening at the same time. Only critical portions of the entire firmware reset sequence are protected by the rtnl_lock. It is possible that bnxt_open_nic() can be called when the firmware reset sequence is aborting. In that case, bnxt_open_nic() needs to check if the ABORT_ERR flag is set and abort if it is. The configuration change that resulted in the bnxt_open_nic() call will fail but the NIC will be brought to a consistent IF_DOWN state. Without this patch, if bnxt_open_nic() were to continue in this error state, it may crash like this: [ 1648.659736] BUG: unable to handle kernel NULL pointer dereference at (null) [ 1648.659768] IP: [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.659796] PGD 101e1b3067 PUD 101e1b2067 PMD 0 [ 1648.659813] Oops: 0000 [#1] SMP [ 1648.659825] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc dell_smbios dell_wmi_descriptor dcdbas amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper vfat cryptd fat pcspkr ipmi_ssif sg k10temp i2c_piix4 wmi ipmi_si ipmi_devintf ipmi_msghandler tpm_crb acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm libahci megaraid_sas crct10dif_pclmul crct10dif_common [ 1648.660063] tg3 libata crc32c_intel bnxt_en(OE) drm_panel_orientation_quirks devlink ptp pps_core dm_mirror dm_region_hash dm_log dm_mod fuse [ 1648.660105] CPU: 13 PID: 3867 Comm: ethtool Kdump: loaded Tainted: G OE ------------ 3.10.0-1152.el7.x86_64 #1 [ 1648.660911] Hardware name: Dell Inc. PowerEdge R7515/0R4CNN, BIOS 1.2.14 01/28/2020 [ 1648.661662] task: ffff94e64cbc9080 ti: ffff94f55df1c000 task.ti: ffff94f55df1c000 [ 1648.662409] RIP: 0010:[<ffffffffc01e9b3a>] [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.663171] RSP: 0018:ffff94f55df1fba8 EFLAGS: 00010202 [ 1648.663927] RAX: 0000000000000000 RBX: ffff94e6827e0000 RCX: 0000000000000000 [ 1648.664684] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff94e6827e08c0 [ 1648.665433] RBP: ffff94f55df1fc20 R08: 00000000000001ff R09: 0000000000000008 [ 1648.666184] R10: 0000000000000d53 R11: ffff94f55df1f7ce R12: ffff94e6827e08c0 [ 1648.666940] R13: ffff94e6827e08c0 R14: ffff94e6827e08c0 R15: ffffffffb9115e40 [ 1648.667695] FS: 00007f8aadba5740(0000) GS:ffff94f57eb40000(0000) knlGS:0000000000000000 [ 1648.668447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1648.669202] CR2: 0000000000000000 CR3: 0000001022772000 CR4: 0000000000340fe0 [ 1648.669966] Call Trace: [ 1648.670730] [<ffffffffc01f1d5d>] ? bnxt_need_reserve_rings+0x9d/0x170 [bnxt_en] [ 1648.671496] [<ffffffffc01fa7ea>] __bnxt_open_nic+0x8a/0x9a0 [bnxt_en] [ 1648.672263] [<ffffffffc01f7479>] ? bnxt_close_nic+0x59/0x1b0 [bnxt_en] [ 1648.673031] [<ffffffffc01fb11b>] bnxt_open_nic+0x1b/0x50 [bnxt_en] [ 1648.673793] [<ffffffffc020037c>] bnxt_set_ringparam+0x6c/0xa0 [bnxt_en] [ 1648.674550] [<ffffffffb8a5f564>] dev_ethtool+0x1334/0x21a0 [ 1648.675306] [<ffffffffb8a719ff>] dev_ioctl+0x1ef/0x5f0 [ 1648.676061] [<ffffffffb8a324bd>] sock_do_ioctl+0x4d/0x60 [ 1648.676810] [<ffffffffb8a326bb>] sock_ioctl+0x1eb/0x2d0 [ 1648.677548] [<ffffffffb8663230>] do_vfs_ioctl+0x3a0/0x5b0 [ 1648.678282] [<ffffffffb8b8e678>] ? __do_page_fault+0x238/0x500 [ 1648.679016] [<ffffffffb86634e1>] SyS_ioctl+0xa1/0xc0 [ 1648.679745] [<ffffffffb8b93f92>] system_call_fastpath+0x25/0x2a [ 1648.680461] Code: 9e 60 01 00 00 0f 1f 40 00 45 8b 8e 48 01 00 00 31 c9 45 85 c9 0f 8e 73 01 00 00 66 0f 1f 44 00 00 49 8b 86 a8 00 00 00 48 63 d1 <48> 8b 14 d0 48 85 d2 0f 84 46 01 00 00 41 8b 86 44 01 00 00 c7 [ 1648.681986] RIP [<ffffffffc01e9b3a>] bnxt_alloc_mem+0x50a/0x1140 [bnxt_en] [ 1648.682724] RSP <ffff94f55df1fba8> [ 1648.683451] CR2: 0000000000000000 Fixes: ec5d31e3 ("bnxt_en: Handle firmware reset status during IF_UP.") Reviewed-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: NPavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vasundhara Volam 提交于
When a PCIe fatal error occurs, the internal latched BAR addresses in the chip get reset even though the BAR register values in config space are retained. pci_restore_state() will not rewrite the BAR addresses if the BAR address values are valid, causing the chip's internal BAR addresses to stay invalid. So we need to zero the BAR registers during PCIe fatal error to force pci_restore_state() to restore the BAR addresses. These write cycles to the BAR registers will cause the proper BAR addresses to latch internally. Fixes: 6316ea6d ("bnxt_en: Enable AER support.") Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vasundhara Volam 提交于
As part of the commit b148bb23 ("bnxt_en: Fix possible crash in bnxt_fw_reset_task()."), cancel_delayed_work_sync() is called only for VFs to fix a possible crash by cancelling any pending delayed work items. It was assumed by mistake that the flush_workqueue() call on the PF would flush delayed work items as well. As flush_workqueue() does not cancel the delayed workqueue, extend the fix for PFs. This fix will avoid the system crash, if there are any pending delayed work items in fw_reset_task() during driver's .remove() call. Unify the workqueue cleanup logic for both PF and VF by calling cancel_work_sync() and cancel_delayed_work_sync() directly in bnxt_remove_one(). Fixes: b148bb23 ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().") Reviewed-by: NPavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: NAndy Gospodarek <gospo@broadcom.com> Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vasundhara Volam 提交于
A recent patch has moved the workqueue cleanup logic before calling unregister_netdev() in bnxt_remove_one(). This caused a regression because the workqueue can be restarted if the device is still open. Workqueue cleanup must be done after unregister_netdev(). The workqueue will not restart itself after the device is closed. Call bnxt_cancel_sp_work() after unregister_netdev() and call bnxt_dl_fw_reporters_destroy() after that. This fixes the regession and the original NULL ptr dereference issue. Fixes: b16939b5 ("bnxt_en: Fix NULL ptr dereference crash in bnxt_fw_reset_task()") Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 13 10月, 2020 6 次提交
-
-
由 Vasundhara Volam 提交于
Add a new bnxt_hwrm_nvm_get_dev_info() to query firmware version information via NVM_GET_DEV_INFO firmware command. Use it to get the running version of the NVM configuration information. This new function will also be used in subsequent patches to get the stored firmware versions. Reviewed-by: NAndy Gospodarek <gospo@broadcom.com> Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/1602493854-29283-8-git-send-email-michael.chan@broadcom.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Michael Chan 提交于
If the VF virtual link is set to always enabled, the speed may be unknown when the physical link is down. The driver currently logs the link speed as 4294967295 Mbps which is SPEED_UNKNOWN. Modify the link up log message as "speed unknown" which makes more sense. Reviewed-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/1602493854-29283-7-git-send-email-michael.chan@broadcom.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Michael Chan 提交于
Log these values that contain useful firmware state information. Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com> Reviewed-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/1602493854-29283-6-git-send-email-michael.chan@broadcom.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Michael Chan 提交于
event_data1 and event_data2 are used when processing most events. Store these in local variables at the beginning of the function to simplify many of the case statements. Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/1602493854-29283-5-git-send-email-michael.chan@broadcom.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Michael Chan 提交于
Currently, bp->msg_enable has default value of 0. It is more useful to have the commonly used NETIF_MSG_DRV and NETIF_MSG_HW enabled by default. v2: Change the fall back bnxt_reset_task() inside bnxt_rx_ring_reset() to silent mode. With older fw, we would take the fall back path and it would be very noisy. Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com> Reviewed-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/1602493854-29283-4-git-send-email-michael.chan@broadcom.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vasundhara Volam 提交于
If NVRAM resources are locked, NVM writes are not permitted. In such scenarios, firmware returns HWRM_ERR_CODE_RESOURCE_LOCKED error to firmware commands. Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/1602493854-29283-2-git-send-email-michael.chan@broadcom.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 05 10月, 2020 10 次提交
-
-
由 Michael Chan 提交于
Currently, the driver will schedule RX ring reset when we get a buffer error in the RX completion record. These RX buffer errors can be due to normal out-of-buffer conditions or a permanent error in the RX ring. Because the driver cannot distinguish between these 2 conditions, we assume all these buffer errors require reset. This is very disruptive when it is just a normal out-of-buffer condition. Newer firmware will now monitor the rings for the permanent failure and will send a notification to the driver when it happens. This allows the driver to reset only when such a notification is received. In environments where we have predominently out-of-buffer conditions, we now can avoid these unnecessary resets. Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
There is logic in the RX path to detect unexpected handles in the RX completion. We'll print a warning and schedule a reset. The next expected handle is then set to 0xffff which is guaranteed to not match any valid handle. This will force all remaining packets in the ring to be discarded before the reset. There can be hundreds of these packets remaining in the ring and there is no need to print the warnings for these forced errors. Reviewed-by: NPavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
Add a per ring rx_resets counter to count these RX resets. Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
On some older chips, it is necessary to do a reset when we get buffer errors associated with an RX ring. These buffer errors may become frequent if the RX ring underruns under heavy traffic. The current code does a global reset of all reasources when this happens. This works but creates a big disruption of all rings when one RX ring is having problem. This patch implements a localized RX ring reset of just the RX ring having the issue. All other rings including all TX rings will not be affected by this single RX ring reset. Only the older chips prior to the P5 class supports this reset. Because it is not a global reset, packets may still be arriving while we are calling firmware to reset that ring. We need to be sure that we don't post any buffers during this time while the ring is undergoing reset. After firmware completes successfully, the ring will be in the reset state with no buffers and we can start filling it with new buffers and posting them. Reviewed-by: NPavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
bnxt_init_one_rx_ring() includes logic to initialize the BDs for one RX ring and to allocate the buffers. Separate the allocation logic into a new bnxt_alloc_one_rx_ring() function. The allocation function will be used later to allocate new buffers for one specified RX ring when we reset that RX ring. Reviewed-by: NPavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
bnxt_free_rx_skbs() frees all the allocated buffers and SKBs for every RX ring. Refactor this function by calling a new function bnxt_free_one_rx_ring_skbs() to free these buffers on one specified RX ring at a time. This is preparation work for resetting one RX ring during run-time. Reviewed-by: NPavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
If firmware does not come out of reset, log FW health status info to provide more information on firmware status. Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Edwin Peer 提交于
The NS3 SoC platforms require assistance from the OP-TEE to recover firmware if a crash occurs while no driver is bound. The CRASHED_NO_MASTER condition is recorded in the firmware status register during the crash to indicate when driver intervension is needed to coordinate a firmware reload. This condition is detected during early driver initialization in order to effect a firmware fastboot on supported platforms when necessary. Reviewed-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Edwin Peer 提交于
Firmware now supports device independent discovery of the status register location. This status register can provide more detailed information about firmware errors, especially if problems occur before the HWRM interface is functioning. Attempt to map this register if it is present and report the firmware status on firmware init failures. Signed-off-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Edwin Peer 提交于
The allocator for the firmware health structure conflates allocation and capability checks, limiting the reusability of the code. This patch separates out the capability check and disablement and improves the warning message to better describe the consequences of an allocation failure. Signed-off-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 9月, 2020 7 次提交
-
-
由 Michael Chan 提交于
This feature allows the user to set the different FEC modes on the NIC port. Any new setting will take effect immediately after a link toggle. Reviewed-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
The current code is reporting the FEC configured settings during link up. Change it to report the more useful active FEC encoding that may be negotiated or auto detected. Reviewed-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
Implement .get_fecparam() method to report the configured and active FEC settings. Also report the supported and advertised FEC settings to the .get_link_ksettings() method. Reviewed-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
On some 200G dual port NICs, if one port is configured to 200G, firmware will disable the ethernet link on the other port. Firmware will send notification to the driver for the disabled port when this happens. Define a new field in the link_info structure to keep track of this state. The new phy_state field replaces the unused loop_back field. Log a message when the phy_state changes state. In the disabled state, disallow any PHY configurations on the disabled port as the firmware will fail all calls to configure the PHY in this state. Reviewed-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Edwin Peer 提交于
The firmware interface has added support for new link speeds using PAM4 modulation. Expand the bnxt_link_info structure to closely mirror the new firmware structures. Add logic to copy the PAM4 capabilities and settings from the firmware. Signed-off-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Edwin Peer 提交于
Extract the code for determining an advertised speed is no longer supported into a separate function. This will avoid some code duplication in a later patch when supporting PAM4 speeds, since these speeds are specified in a separate field. Reviewed-by: NScott Branden <scott.branden@broadcom.com> Signed-off-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
The main changes include FEC, ECN statistics, HWRM_PORT_PHY_QCFG response size reduction, and a new counter added to ctx_hw_stats_ext struct to support the new 58818 chip. The ctx_hw_stats_ext structure is now the superset supporting the new 58818 chips and the prior P5 chips. Add a new flag to identify the new chip and use constants for the chip specific ring statistics sizes instead of the size of the structure. Because the HWRM_PORT_PHY_QCFG response structure size has shrunk back to 96 bytes, the workaround added earlier to limit the size of this message for forwarding to the VF can be removed. Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 9月, 2020 4 次提交
-
-
由 Michael Chan 提交于
The wrong flag value caused the firmware call to return actual port counters instead of the counter masks. This messed up the counter overflow logic and caused erratic extended port counters to be displayed under ethtool -S. Fixes: 531d1d26 ("bnxt_en: Retrieve hardware masks for port counters.") Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael Chan 提交于
Fix it to set the required fid input parameter. The firmware call fails without this patch. Fixes: d752d053 ("bnxt_en: Retrieve hardware counter masks from firmware if available.") Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Edwin Peer 提交于
Returning "unknown" as a temperature value violates the hwmon interface rules. Appropriate error codes should be returned via device_attribute show instead. These will ultimately be propagated to the user via the file system interface. In addition to the corrected error handling, it is an even better idea to not present the sensor in sysfs at all if it is known that the read will definitely fail. Given that temp1_input is currently the only sensor reported, ensure no hwmon registration if TEMP_MONITOR_QUERY is not supported or if it will fail due to access permissions. Something smarter may be needed if and when other sensors are added. Fixes: 12cce90b ("bnxt_en: fix HWRM error when querying VF temperature") Signed-off-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vasundhara Volam 提交于
Using strlcpy() to copy from VPD is not correct because VPD strings are not necessarily NULL terminated. Use memcpy() to copy the VPD length up to the destination buffer size - 1. The destination is zeroed memory so it will always be NULL terminated. Fixes: a0d0fd70 ("bnxt_en: Read partno and serialno of the board from VPD") Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 9月, 2020 1 次提交
-
-
由 Jakub Kicinski 提交于
We allow drivers to call napi_hash_del() before calling netif_napi_del() to batch RCU grace periods. This makes the API asymmetric and leaks internal implementation details. Soon we will want the grace period to protect more than just the NAPI hash table. Restructure the API and have drivers call a new function - __netif_napi_del() if they want to take care of RCU waits. Note that only core was checking the return status from napi_hash_del() so the new helper does not report if the NAPI was actually deleted. Some notes on driver oddness: - veth observed the grace period before calling netif_napi_del() but that should not matter - myri10ge observed normal RCU flavor - bnx2x and enic did not actually observe the grace period (unless they did so implicitly) - virtio_net and enic only unhashed Rx NAPIs The last two points seem to indicate that the calls to napi_hash_del() were a left over rather than an optimization. Regardless, it's easy enough to correct them. This patch may introduce extra synchronize_net() calls for interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on free_netdev() to call netif_napi_del(). This seems inevitable since we want to use RCU for netpoll dev->napi_list traversal, and almost no drivers set IFF_DISABLE_NETPOLL. Signed-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 9月, 2020 2 次提交
-
-
由 Vasundhara Volam 提交于
bnxt_fw_reset_task() which runs from a workqueue can race with bnxt_remove_one(). For example, if firmware reset and VF FLR are happening at about the same time. bnxt_remove_one() already cancels the workqueue and waits for it to finish, but we need to do this earlier before the devlink reporters are destroyed. This will guarantee that the devlink reporters will always be valid when bnxt_fw_reset_task() is still running. Fixes: b148bb23 ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().") Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vasundhara Volam 提交于
When the driver goes through PCIe AER reset in error state, all firmware messages will timeout because the PCIe bus is no longer accessible. This can lead to AER reset taking many minutes to complete as each firmware command takes time to timeout. Define a new macro BNXT_NO_FW_ACCESS() to skip these firmware messages when either firmware is in fatal error state or when pci_channel_offline() is true. It now takes a more reasonable 20 to 30 seconds to complete AER recovery. Fixes: b4fff207 ("bnxt_en: Do not send firmware messages if firmware is in error state.") Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 27 8月, 2020 1 次提交
-
-
由 Jakub Kicinski 提交于
Netpoll can try to poll napi as soon as napi_enable() is called. It crashes trying to access a doorbell which is still NULL: BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 59 PID: 6039 Comm: ethtool Kdump: loaded Tainted: G S 5.9.0-rc1-00469-g5fd99b5d-dirty #26 RIP: 0010:bnxt_poll+0x121/0x1c0 Code: c4 20 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 41 8b 86 a0 01 00 00 41 23 85 18 01 00 00 49 8b 96 a8 01 00 00 0d 00 00 00 24 <89> 02 41 f6 45 77 02 74 cb 49 8b ae d8 01 00 00 31 c0 c7 44 24 1a netpoll_poll_dev+0xbd/0x1a0 __netpoll_send_skb+0x1b2/0x210 netpoll_send_udp+0x2c9/0x406 write_ext_msg+0x1d7/0x1f0 console_unlock+0x23c/0x520 vprintk_emit+0xe0/0x1d0 printk+0x58/0x6f x86_vector_activate.cold+0xf/0x46 __irq_domain_activate_irq+0x50/0x80 __irq_domain_activate_irq+0x32/0x80 __irq_domain_activate_irq+0x32/0x80 irq_domain_activate_irq+0x25/0x40 __setup_irq+0x2d2/0x700 request_threaded_irq+0xfb/0x160 __bnxt_open_nic+0x3b1/0x750 bnxt_open_nic+0x19/0x30 ethtool_set_channels+0x1ac/0x220 dev_ethtool+0x11ba/0x2240 dev_ioctl+0x1cf/0x390 sock_do_ioctl+0x95/0x130 Reported-by: NRob Sherwood <rsher@fb.com> Fixes: c0c050c5 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: NJakub Kicinski <kuba@kernel.org> Reviewed-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 8月, 2020 3 次提交
-
-
由 Michael Chan 提交于
The recent changes to support user-defined RSS map assume that RX rings are always reserved and the default RSS map is set after the RX rings are successfully reserved. If the firmware spec is older than 1.6.1, no ring reservations are required and the default RSS map is not setup at all. In another scenario where the fw Resource Manager is older, RX rings are not reserved and we also end up with no valid RSS map. Fix both issues in bnxt_need_reserve_rings(). In both scenarios described above, we don't need to reserve RX rings so we need to call this new function bnxt_check_rss_map_no_rmgr() to setup the default RSS map when needed. Without valid RSS map, the NIC won't receive packets properly. Fixes: 1667cbf6 ("bnxt_en: Add logical RSS indirection table structure.") Reviewed-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Edwin Peer 提交于
There are no VF rings available during probe when the device is configured using the Minimal-Static reservation strategy. In this case, the RSS indirection table can only be initialized later, during bnxt_open_nic(). However, this was not happening because the rings will already have been reserved via bnxt_init_dflt_ring_mode(), causing bnxt_need_reserve_rings() to return false in bnxt_reserve_rings() and bypass the RSS table init. Solve this by pushing the call to bnxt_set_dflt_rss_indir_tbl() into __bnxt_reserve_rings(), which is common to both paths and is called whenever ring configuration is changed. After doing this, the RSS table init that must be called from bnxt_init_one() happens implicitly via bnxt_set_default_rings(), necessitating doing the allocation earlier in order to avoid a null pointer dereference. Fixes: bd3191b5 ("bnxt_en: Implement ethtool -X to set indirection table.") Signed-off-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Edwin Peer 提交于
Firmware returns RESOURCE_ACCESS_DENIED for HWRM_TEMP_MONITORY_QUERY for VFs. This produces unpleasing error messages in the log when temp1_input is queried via the hwmon sysfs interface from a VF. The error is harmless and expected, so silence it and return unknown as the value. Since the device temperature is not particularly sensitive information, provide flexibility to change this policy in future by silencing the error rather than avoiding the HWRM call entirely for VFs. Fixes: cde49a42 ("bnxt_en: Add hwmon sysfs support to read temperature") Cc: Marc Smith <msmith626@gmail.com> Reported-by: NMarc Smith <msmith626@gmail.com> Signed-off-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-