1. 06 5月, 2016 1 次提交
    • M
      cxlflash: Fix to resolve dead-lock during EEH recovery · 635f6b08
      Manoj N. Kumar 提交于
      When a cxlflash adapter goes into EEH recovery and multiple processes
      (each having established its own context) are active, the EEH recovery
      can hang if the processes attempt to recover in parallel. The symptom
      logged after a couple of minutes is:
      
      INFO: task eehd:48 blocked for more than 120 seconds.
      Not tainted 4.5.0-491-26f710d+ #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      eehd            0    48      2
      Call Trace:
      __switch_to+0x2f0/0x410
      __schedule+0x300/0x980
      schedule+0x48/0xc0
      rwsem_down_write_failed+0x294/0x410
      down_write+0x88/0xb0
      cxlflash_pci_error_detected+0x100/0x1c0 [cxlflash]
      cxl_vphb_error_detected+0x88/0x110 [cxl]
      cxl_pci_error_detected+0xb0/0x1d0 [cxl]
      eeh_report_error+0xbc/0x130
      eeh_pe_dev_traverse+0x94/0x160
      eeh_handle_normal_event+0x17c/0x450
      eeh_handle_event+0x184/0x370
      eeh_event_handler+0x1c8/0x1d0
      kthread+0x110/0x130
      ret_from_kernel_thread+0x5c/0xa4
      INFO: task blockio:33215 blocked for more than 120 seconds.
      
      Not tainted 4.5.0-491-26f710d+ #1
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      blockio         0 33215  33213
      Call Trace:
      0x1 (unreliable)
      __switch_to+0x2f0/0x410
      __schedule+0x300/0x980
      schedule+0x48/0xc0
      rwsem_down_read_failed+0x124/0x1d0
      down_read+0x68/0x80
      cxlflash_ioctl+0x70/0x6f0 [cxlflash]
      scsi_ioctl+0x3b0/0x4c0
      sg_ioctl+0x960/0x1010
      do_vfs_ioctl+0xd8/0x8c0
      SyS_ioctl+0xd4/0xf0
      system_call+0x38/0xb4
      INFO: task eehd:48 blocked for more than 120 seconds.
      
      The hang is because of a 3 way dead-lock:
      
      Process A holds the recovery mutex, and waits for eehd to complete.
      Process B holds the semaphore and waits for the recovery mutex.
      eehd waits for semaphore.
      
      The fix is to have Process B above release the semaphore before
      attempting to acquire the recovery mutex. This will allow
      eehd to proceed to completion.
      Signed-off-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com>
      Reviewed-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      635f6b08
  2. 04 5月, 2016 2 次提交
  3. 30 4月, 2016 25 次提交
  4. 26 4月, 2016 6 次提交
  5. 16 4月, 2016 6 次提交