1. 12 1月, 2017 2 次提交
  2. 10 1月, 2017 2 次提交
    • D
      scsi: qla2xxx: Fix apparent cut-n-paste error. · c3c42394
      Dave Jones 提交于
      Commit 093df737 ("scsi: qla2xxx: Fix Target mode handling with
      Multiqueue changes.") introduces two bodies of code that look similar
      but with s/req/rsp/ in the second instance.  But in one case, it looks
      like this conversion was missed.
      Signed-off-by: NDave Jones <davej@codemonkey.org.uk>
      Reviewed-by: NLaurence Oberman <loberman@redhat.com>
      Acked-by: NQuinn Tran <Quinn.Tran@cavium.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      c3c42394
    • M
      scsi: qla2xxx: Get mutex lock before checking optrom_state · c7702b8c
      Milan P. Gandhi 提交于
      There is a race condition with qla2xxx optrom functions where one thread
      might modify optrom buffer, optrom_state while other thread is still
      reading from it.
      
      In couple of crashes, it was found that we had successfully passed the
      following 'if' check where we confirm optrom_state to be
      QLA_SREADING. But by the time we acquired mutex lock to proceed with
      memory_read_from_buffer function, some other thread/process had already
      modified that option rom buffer and optrom_state from QLA_SREADING to
      QLA_SWAITING. Then we got ha->optrom_buffer 0x0 and crashed the system:
      
              if (ha->optrom_state != QLA_SREADING)
                      return 0;
      
              mutex_lock(&ha->optrom_mutex);
              rval = memory_read_from_buffer(buf, count, &off, ha->optrom_buffer,
                  ha->optrom_region_size);
              mutex_unlock(&ha->optrom_mutex);
      
      With current optrom function we get following crash due to a race
      condition:
      
      [ 1479.466679] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [ 1479.466707] IP: [<ffffffff81326756>] memcpy+0x6/0x110
      [...]
      [ 1479.473673] Call Trace:
      [ 1479.474296]  [<ffffffff81225cbc>] ? memory_read_from_buffer+0x3c/0x60
      [ 1479.474941]  [<ffffffffa01574dc>] qla2x00_sysfs_read_optrom+0x9c/0xc0 [qla2xxx]
      [ 1479.475571]  [<ffffffff8127e76b>] read+0xdb/0x1f0
      [ 1479.476206]  [<ffffffff811fdf9e>] vfs_read+0x9e/0x170
      [ 1479.476839]  [<ffffffff811feb6f>] SyS_read+0x7f/0xe0
      [ 1479.477466]  [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b
      
      Below patch modifies qla2x00_sysfs_read_optrom,
      qla2x00_sysfs_write_optrom functions to get the mutex_lock before
      checking ha->optrom_state to avoid similar crashes.
      
      The patch was applied and tested and same crashes were no longer
      observed again.
      Tested-by: NMilan P. Gandhi <mgandhi@redhat.com>
      Signed-off-by: NMilan P. Gandhi <mgandhi@redhat.com>
      Reviewed-by: NLaurence Oberman <loberman@redhat.com>
      Acked-by: NHimanshu Madhani <himanshu.madhani@cavium.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      c7702b8c
  3. 25 12月, 2016 1 次提交
  4. 15 12月, 2016 5 次提交
  5. 10 12月, 2016 1 次提交
  6. 18 11月, 2016 6 次提交
  7. 15 11月, 2016 1 次提交
    • M
      scsi: qla2xxx: do not abort all commands in the adapter during EEH recovery · c733ab35
      Mauricio Faria de Oliveira 提交于
      The previous commit 1535aa75 ("qla2xxx: fix invalid DMA access after
      command aborts in PCI device remove") introduced a regression during an
      EEH recovery, since the change to the qla2x00_abort_all_cmds() function
      calls qla2xxx_eh_abort(), which verifies the EEH recovery condition but
      handles it heavy-handed. (commit a465537a "qla2xxx: Disable the
      adapter and skip error recovery in case of register disconnect.")
      
      This problem warrants a more general/optimistic solution right into
      qla2xxx_eh_abort() (eg in case a real command abort arrives during EEH
      recovery, or if it takes long enough to trigger command aborts); but
      it's still worth to add a check to ensure the code added by the previous
      commit is correct and contained within its owner function.
      
      This commit just adds a 'if (!ha->flags.eeh_busy)' check around it.
      (ahem; a trivial fix for this -rc series; sorry for this oversight.)
      
      With it applied, both PCI device remove and EEH recovery works fine.
      
      Fixes: 1535aa75 ("scsi: qla2xxx: fix invalid DMA access after command aborts in PCI device remove")
      Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Acked-by: NHimanshu Madhani <himanshu.madhani@cavium.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      c733ab35
  8. 09 11月, 2016 2 次提交
  9. 02 11月, 2016 1 次提交
    • B
      scsi: qla2xxx: Fix scsi scan hang triggered if adapter fails during init · a5dd506e
      Bill Kuzeja 提交于
      A system can get hung task timeouts if a qlogic board fails during
      initialization (if the board breaks again or fails the init). The hang
      involves the scsi scan.
      
      In a nutshell, since commit beb9e315 ("qla2xxx: Prevent removal and
      board_disable race"):
      
      ...it is possible to have freed ha (base_vha->hw) early by a call to
      qla2x00_remove_one when pdev->enable_cnt equals zero:
      
             if (!atomic_read(&pdev->enable_cnt)) {
                     scsi_host_put(base_vha->host);
                     kfree(ha);
                     pci_set_drvdata(pdev, NULL);
                     return;
      
      Almost always, the scsi_host_put above frees the vha structure
      (attached to the end of the Scsi_Host we're putting) since it's the last
      put, and life is good.  However, if we are entering this routine because
      the adapter has broken sometime during initialization AND a scsi scan is
      already in progress (and has done its own scsi_host_get), vha will not
      be freed. What's worse, the scsi scan will access the freed ha structure
      through qla2xxx_scan_finished:
      
              if (time > vha->hw->loop_reset_delay * HZ)
                      return 1;
      
      The scsi scan keeps checking to see if a scan is complete by calling
      qla2xxx_scan_finished. There is a timeout value that limits the length
      of time a scan can take (hw->loop_reset_delay, usually set to 5
      seconds), but this definition is in the data structure (hw) that can get
      freed early.
      
      This can yield unpredictable results, the worst of which is that the
      scsi scan can hang indefinitely. This happens when the freed structure
      gets reused and loop_reset_delay gets overwritten with garbage, which
      the scan obliviously uses as its timeout value.
      
      The fix for this is simple: at the top of qla2xxx_scan_finished, check
      for the UNLOADING bit in the vha structure (_vha is not freed at this
      point).  If UNLOADING is set, we exit the scan for this adapter
      immediately. After this last reference to the ha structure, we'll exit
      the scan for this adapter, and continue on.
      
      This problem is hard to hit, but I have run into it doing negative
      testing many times now (with a test specifically designed to bring it
      out), so I can verify that this fix works. My testing has been against a
      RHEL7 driver variant, but the bug and patch are equally relevant to to
      the upstream driver.
      
      Fixes: beb9e315 ("qla2xxx: Prevent removal and board_disable race")
      Cc: <stable@vger.kernel.org> # v3.18+
      Signed-off-by: NBill Kuzeja <william.kuzeja@stratus.com>
      Acked-by: NHimanshu Madhani <himanshu.madhani@cavium.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      a5dd506e
  10. 15 9月, 2016 1 次提交
  11. 31 8月, 2016 1 次提交
  12. 09 8月, 2016 1 次提交
  13. 16 7月, 2016 16 次提交