bnxt_en: Fix possible unintended driver initiated error recovery
stable inclusion from stable-5.10.68 commit 52a7e6667133553a51f93076f96c9294314ae44f bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=52a7e6667133553a51f93076f96c9294314ae44f -------------------------------- [ Upstream commit 1b2b9183 ] If error recovery is already enabled, bnxt_timer() will periodically check the heartbeat register and the reset counter. If we get an error recovery async. notification from the firmware (e.g. change in primary/secondary role), we will immediately read and update the heartbeat register and the reset counter. If the timer for the next health check expires soon after this, we may read the heartbeat register again in quick succession and find that it hasn't changed. This will trigger error recovery unintentionally. The likelihood is small because we also reset fw_health->tmr_counter which will reset the interval for the next health check. But the update is not protected and bnxt_timer() can miss the update and perform the health check without waiting for the full interval. Fix it by only reading the heartbeat register and reset counter in bnxt_async_event_process() if error recovery is trasitioning to the enabled state. Also add proper memory barriers so that when enabling for the first time, bnxt_timer() will see the tmr_counter interval and perform the health check after the full interval has elapsed. Fixes: 7e914027 ("bnxt_en: Enable health monitoring.") Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Showing
想要评论请 注册 或 登录