1. 10 4月, 2018 1 次提交
    • D
      scsi: aacraid: Insure command thread is not recursively stopped · 1c6b41fb
      Dave Carroll 提交于
      If a recursive IOP_RESET is invoked, usually due to the eh_thread
      handling errors after the first reset, be sure we flag that the command
      thread has been stopped to avoid an Oops of the form;
      
       [ 336.620256] CPU: 28 PID: 1193 Comm: scsi_eh_0 Kdump: loaded Not tainted 4.14.0-49.el7a.ppc64le #1
       [ 336.620297] task: c000003fd630b800 task.stack: c000003fd61a4000
       [ 336.620326] NIP: c000000000176794 LR: c00000000013038c CTR: c00000000024bc10
       [ 336.620361] REGS: c000003fd61a7720 TRAP: 0300 Not tainted (4.14.0-49.el7a.ppc64le)
       [ 336.620395] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 22084022 XER: 20040000
       [ 336.620435] CFAR: c000000000130388 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1
       [ 336.620435] GPR00: c00000000013038c c000003fd61a79a0 c0000000014c7e00 0000000000000000
       [ 336.620435] GPR04: 000000000000000c 000000000000000c 9000000000009033 0000000000000477
       [ 336.620435] GPR08: 0000000000000477 0000000000000000 0000000000000000 c008000010f7d940
       [ 336.620435] GPR12: c00000000024bc10 c000000007a33400 c0000000001708a8 c000003fe3b881d8
       [ 336.620435] GPR16: c000003fe3b88060 c000003fd61a7d10 fffffffffffff000 000000000000001e
       [ 336.620435] GPR20: 0000000000000001 c000000000ebf1a0 0000000000000001 c000003fe3b88000
       [ 336.620435] GPR24: 0000000000000003 0000000000000002 c000003fe3b88840 c000003fe3b887e8
       [ 336.620435] GPR28: c000003fe3b88000 c000003fc8181788 0000000000000000 c000003fc8181700
       [ 336.620750] NIP [c000000000176794] exit_creds+0x34/0x160
       [ 336.620775] LR [c00000000013038c] __put_task_struct+0x8c/0x1f0
       [ 336.620804] Call Trace:
       [ 336.620817] [c000003fd61a79a0] [c000003fe3b88000] 0xc000003fe3b88000 (unreliable)
       [ 336.620853] [c000003fd61a79d0] [c00000000013038c] __put_task_struct+0x8c/0x1f0
       [ 336.620889] [c000003fd61a7a00] [c000000000171418] kthread_stop+0x1e8/0x1f0
       [ 336.620922] [c000003fd61a7a40] [c008000010f7448c] aac_reset_adapter+0x14c/0x8d0 [aacraid]
       [ 336.620959] [c000003fd61a7b00] [c008000010f60174] aac_eh_host_reset+0x84/0x100 [aacraid]
       [ 336.621010] [c000003fd61a7b30] [c000000000864f24] scsi_try_host_reset+0x74/0x180
       [ 336.621046] [c000003fd61a7bb0] [c000000000867ac0] scsi_eh_ready_devs+0xc00/0x14d0
       [ 336.625165] [c000003fd61a7ca0] [c0000000008699e0] scsi_error_handler+0x550/0x730
       [ 336.632101] [c000003fd61a7dc0] [c000000000170a08] kthread+0x168/0x1b0
       [ 336.639031] [c000003fd61a7e30] [c00000000000b528] ret_from_kernel_thread+0x5c/0xb4
       [ 336.645971] Instruction dump:
       [ 336.648743] 384216a0 7c0802a6 fbe1fff8 f8010010 f821ffd1 7c7f1b78 60000000 60000000
       [ 336.657056] 39400000 e87f0838 f95f0838 7c0004ac <7d401828> 314affff 7d40192d 40c2fff4
       [ 336.663997] -[ end trace 4640cf8d4945ad95 ]-
      
      So flag when the thread is stopped by setting the thread pointer to NULL.
      Signed-off-by: NDave Carroll <david.carroll@microsemi.com>
      Reviewed-by: NRaghava Aditya Renukunta <raghavaaditya.renukunta@microsemi.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      1c6b41fb
  2. 04 1月, 2018 17 次提交
  3. 29 11月, 2017 1 次提交
    • A
      scsi: aacraid: address UBSAN warning regression · d1853975
      Arnd Bergmann 提交于
      As reported by Meelis Roos, my previous patch causes an incorrect
      calculation of the timeout, through an undefined signed integer
      overflow:
      
      [   12.228155] UBSAN: Undefined behaviour in drivers/scsi/aacraid/commsup.c:2514:49
      [   12.228229] signed integer overflow:
      [   12.228283] 964297611 * 250 cannot be represented in type 'long int'
      
      The problem is that doing a multiplication with HZ first and then
      dividing by USEC_PER_SEC worked correctly for 32-bit microseconds,
      but not for 32-bit nanoseconds, which would require up to 41 bits.
      
      This reworks the calculation to first convert the nanoseconds into
      jiffies, which should give us the same result as before and not overflow.
      
      Unfortunately I did not understand the exact intention of the algorithm,
      in particular the part where we add half a second, so it's possible that
      there is still a preexisting problem in this function. I added a comment
      that this would be handled more nicely using usleep_range(), which
      generally works better for waking up at a particular time than the
      current schedule_timeout() based implementation. I did not feel
      comfortable trying to implement that without being sure what the
      intent is here though.
      
      Fixes: 820f1886 ("scsi: aacraid: use timespec64 instead of timeval")
      Tested-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      d1853975
  4. 21 11月, 2017 2 次提交
    • G
      scsi: aacraid: Prevent crash in case of free interrupt during scsi EH path · e4717292
      Guilherme G. Piccoli 提交于
      As part of the scsi EH path, aacraid performs a reinitialization of the
      adapter, which encompass freeing resources and IRQs, NULLifying lots of
      pointers, and then initialize it all over again.  We've identified a
      problem during the free IRQ portion of this path if CONFIG_DEBUG_SHIRQ
      is enabled on kernel config file.
      
      Happens that, in case this flag was set, right after free_irq()
      effectively clears the interrupt, it checks if it was requested as
      IRQF_SHARED. In positive case, it performs another call to the IRQ
      handler on driver. Problem is: since aacraid currently free some
      resources *before* freeing the IRQ, once free_irq() path calls the
      handler again (due to CONFIG_DEBUG_SHIRQ), aacraid crashes due to NULL
      pointer dereference with the following trace:
      
        aac_src_intr_message+0xf8/0x740 [aacraid]
        __free_irq+0x33c/0x4a0
        free_irq+0x78/0xb0
        aac_free_irq+0x13c/0x150 [aacraid]
        aac_reset_adapter+0x2e8/0x970 [aacraid]
        aac_eh_reset+0x3a8/0x5d0 [aacraid]
        scsi_try_host_reset+0x74/0x180
        scsi_eh_ready_devs+0xc70/0x1510
        scsi_error_handler+0x624/0xa20
      
      This patch prevents the crash by changing the order of the
      deinitialization in this path of aacraid: first we clear the IRQ, then
      we free other resources. No functional change intended.
      Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Reviewed-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      e4717292
    • G
      scsi: aacraid: Check for PCI state of device in a generic way · bd257b2f
      Guilherme G. Piccoli 提交于
      Commit 16ae9dd3 ("scsi: aacraid: Fix for excessive prints on EEH")
      introduced checks about the state of device before any PCI operations in
      the driver. Basically, this prevents it to perform PCI accesses when
      device is in the process of recover from a PCI error. In PowerPC, such
      mechanism is called EEH, and the aforementioned commit introduced checks
      that are based on EEH-specific primitives for that.
      
      The potential problems with this approach are three: first, these checks
      are "locked" to powerpc only - another archs could have error recovery
      methods too, like AER in Intel. Also, the powerpc primitives perform
      expensive FW accesses to validate the precise PCI state of a device.
      Finally, code becomes more complicated and needs ifdef validation based
      on arch config being set.
      
      So, this patch makes use of generic PCI state checks, which are
      lightweight and non-dependent of arch configs - also, it makes the code
      cleaner.
      
      Fixes: 16ae9dd3 ("scsi: aacraid: Fix for excessive prints on EEH")
      Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Reviewed-by: NDave Carroll <david.carroll@microsemi.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      bd257b2f
  5. 09 11月, 2017 1 次提交
  6. 08 8月, 2017 2 次提交
  7. 13 6月, 2017 5 次提交
  8. 27 4月, 2017 1 次提交
  9. 12 4月, 2017 1 次提交
  10. 16 3月, 2017 1 次提交
  11. 28 2月, 2017 1 次提交
  12. 24 2月, 2017 1 次提交
  13. 23 2月, 2017 6 次提交