提交 4de9eb20 编写于 作者: M Michael Chan 提交者: Zheng Zengkai

bnxt_en: Fix possible unintended driver initiated error recovery

stable inclusion
from stable-5.10.68
commit 52a7e6667133553a51f93076f96c9294314ae44f
bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=52a7e6667133553a51f93076f96c9294314ae44f

--------------------------------

[ Upstream commit 1b2b9183 ]

If error recovery is already enabled, bnxt_timer() will periodically
check the heartbeat register and the reset counter.  If we get an
error recovery async. notification from the firmware (e.g. change in
primary/secondary role), we will immediately read and update the
heartbeat register and the reset counter.  If the timer for the next
health check expires soon after this, we may read the heartbeat register
again in quick succession and find that it hasn't changed.  This will
trigger error recovery unintentionally.

The likelihood is small because we also reset fw_health->tmr_counter
which will reset the interval for the next health check.  But the
update is not protected and bnxt_timer() can miss the update and
perform the health check without waiting for the full interval.

Fix it by only reading the heartbeat register and reset counter in
bnxt_async_event_process() if error recovery is trasitioning to the
enabled state.  Also add proper memory barriers so that when enabling
for the first time, bnxt_timer() will see the tmr_counter interval and
perform the health check after the full interval has elapsed.

Fixes: 7e914027 ("bnxt_en: Enable health monitoring.")
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
上级 480a39f7
...@@ -2112,25 +2112,34 @@ static int bnxt_async_event_process(struct bnxt *bp, ...@@ -2112,25 +2112,34 @@ static int bnxt_async_event_process(struct bnxt *bp,
if (!fw_health) if (!fw_health)
goto async_event_process_exit; goto async_event_process_exit;
fw_health->enabled = EVENT_DATA1_RECOVERY_ENABLED(data1); if (!EVENT_DATA1_RECOVERY_ENABLED(data1)) {
fw_health->master = EVENT_DATA1_RECOVERY_MASTER_FUNC(data1); fw_health->enabled = false;
if (!fw_health->enabled) {
netif_info(bp, drv, bp->dev, netif_info(bp, drv, bp->dev,
"Error recovery info: error recovery[0]\n"); "Error recovery info: error recovery[0]\n");
break; break;
} }
fw_health->master = EVENT_DATA1_RECOVERY_MASTER_FUNC(data1);
fw_health->tmr_multiplier = fw_health->tmr_multiplier =
DIV_ROUND_UP(fw_health->polling_dsecs * HZ, DIV_ROUND_UP(fw_health->polling_dsecs * HZ,
bp->current_interval * 10); bp->current_interval * 10);
fw_health->tmr_counter = fw_health->tmr_multiplier; fw_health->tmr_counter = fw_health->tmr_multiplier;
if (!fw_health->enabled) {
fw_health->last_fw_heartbeat = fw_health->last_fw_heartbeat =
bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG); bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG);
fw_health->last_fw_reset_cnt = fw_health->last_fw_reset_cnt =
bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG); bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
}
netif_info(bp, drv, bp->dev, netif_info(bp, drv, bp->dev,
"Error recovery info: error recovery[1], master[%d], reset count[%u], health status: 0x%x\n", "Error recovery info: error recovery[1], master[%d], reset count[%u], health status: 0x%x\n",
fw_health->master, fw_health->last_fw_reset_cnt, fw_health->master, fw_health->last_fw_reset_cnt,
bnxt_fw_health_readl(bp, BNXT_FW_HEALTH_REG)); bnxt_fw_health_readl(bp, BNXT_FW_HEALTH_REG));
if (!fw_health->enabled) {
/* Make sure tmr_counter is set and visible to
* bnxt_health_check() before setting enabled to true.
*/
smp_wmb();
fw_health->enabled = true;
}
goto async_event_process_exit; goto async_event_process_exit;
} }
case ASYNC_EVENT_CMPL_EVENT_ID_DEBUG_NOTIFICATION: case ASYNC_EVENT_CMPL_EVENT_ID_DEBUG_NOTIFICATION:
...@@ -10738,6 +10747,8 @@ static void bnxt_fw_health_check(struct bnxt *bp) ...@@ -10738,6 +10747,8 @@ static void bnxt_fw_health_check(struct bnxt *bp)
if (!fw_health->enabled || test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) if (!fw_health->enabled || test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
return; return;
/* Make sure it is enabled before checking the tmr_counter. */
smp_rmb();
if (fw_health->tmr_counter) { if (fw_health->tmr_counter) {
fw_health->tmr_counter--; fw_health->tmr_counter--;
return; return;
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册