• E
    net/mlx5e: Recover Send Queue (SQ) from error state · db75373c
    Eran Ben Elisha 提交于
    An error TX completion (CQE) which arrived on a specific SQ indicates
    that this SQ got moved by the hardware to error state, which means all
    pending and incoming TX requests are dropped or will be dropped and no
    further "Good" CQEs will be generated for that SQ.
    
    Before this patch TX completions (CQEs) were not monitored and were
    handled as a regular CQE. This caused the SQ to stay in an error state,
    making it useless for xmiting new packets.
    
    Mitigation plan:
    In case of an error completion, schedule a recovery work which would do
    the following:
    - Mark the TXQ as DRV_XOFF to disable new packets to arrive from the
      stack
    - NAPI to flush all pending SQ WQEs (via flush_in_error_en bit) to
      release SW and HW resources(SKB, DMA, etc) and have the SQ and CQ
      consumer/producer indices synced.
    - Modify the SQ state ERR -> RST -> RDY (restart the SQ).
    - Reactivate the SQ and reset SQ cc and pc
    
    If we identify two consecutive requests for SQ recover in less than
    500 msecs, drop the recover request to avoid CPU overload, as this
    scenario most likely happened due to a severe repeated bug.
    
    In addition, add SQ recover SW counter to monitor successful recoveries.
    Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
    Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
    db75373c
en_main.c 117.9 KB