提交 · f0d98d85831bf1a3b1f56f8c14af60797aaca536 · openeuler / Kernel

10 4月, 2018 1 次提交

scsi: aacraid: Insure command thread is not recursively stopped · 1c6b41fb

由 Dave Carroll 提交于 4月 03, 2018

If a recursive IOP_RESET is invoked, usually due to the eh_thread
handling errors after the first reset, be sure we flag that the command
thread has been stopped to avoid an Oops of the form;

 [ 336.620256] CPU: 28 PID: 1193 Comm: scsi_eh_0 Kdump: loaded Not tainted 4.14.0-49.el7a.ppc64le #1
 [ 336.620297] task: c000003fd630b800 task.stack: c000003fd61a4000
 [ 336.620326] NIP: c000000000176794 LR: c00000000013038c CTR: c00000000024bc10
 [ 336.620361] REGS: c000003fd61a7720 TRAP: 0300 Not tainted (4.14.0-49.el7a.ppc64le)
 [ 336.620395] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 22084022 XER: 20040000
 [ 336.620435] CFAR: c000000000130388 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1
 [ 336.620435] GPR00: c00000000013038c c000003fd61a79a0 c0000000014c7e00 0000000000000000
 [ 336.620435] GPR04: 000000000000000c 000000000000000c 9000000000009033 0000000000000477
 [ 336.620435] GPR08: 0000000000000477 0000000000000000 0000000000000000 c008000010f7d940
 [ 336.620435] GPR12: c00000000024bc10 c000000007a33400 c0000000001708a8 c000003fe3b881d8
 [ 336.620435] GPR16: c000003fe3b88060 c000003fd61a7d10 fffffffffffff000 000000000000001e
 [ 336.620435] GPR20: 0000000000000001 c000000000ebf1a0 0000000000000001 c000003fe3b88000
 [ 336.620435] GPR24: 0000000000000003 0000000000000002 c000003fe3b88840 c000003fe3b887e8
 [ 336.620435] GPR28: c000003fe3b88000 c000003fc8181788 0000000000000000 c000003fc8181700
 [ 336.620750] NIP [c000000000176794] exit_creds+0x34/0x160
 [ 336.620775] LR [c00000000013038c] __put_task_struct+0x8c/0x1f0
 [ 336.620804] Call Trace:
 [ 336.620817] [c000003fd61a79a0] [c000003fe3b88000] 0xc000003fe3b88000 (unreliable)
 [ 336.620853] [c000003fd61a79d0] [c00000000013038c] __put_task_struct+0x8c/0x1f0
 [ 336.620889] [c000003fd61a7a00] [c000000000171418] kthread_stop+0x1e8/0x1f0
 [ 336.620922] [c000003fd61a7a40] [c008000010f7448c] aac_reset_adapter+0x14c/0x8d0 [aacraid]
 [ 336.620959] [c000003fd61a7b00] [c008000010f60174] aac_eh_host_reset+0x84/0x100 [aacraid]
 [ 336.621010] [c000003fd61a7b30] [c000000000864f24] scsi_try_host_reset+0x74/0x180
 [ 336.621046] [c000003fd61a7bb0] [c000000000867ac0] scsi_eh_ready_devs+0xc00/0x14d0
 [ 336.625165] [c000003fd61a7ca0] [c0000000008699e0] scsi_error_handler+0x550/0x730
 [ 336.632101] [c000003fd61a7dc0] [c000000000170a08] kthread+0x168/0x1b0
 [ 336.639031] [c000003fd61a7e30] [c00000000000b528] ret_from_kernel_thread+0x5c/0xb4
 [ 336.645971] Instruction dump:
 [ 336.648743] 384216a0 7c0802a6 fbe1fff8 f8010010 f821ffd1 7c7f1b78 60000000 60000000
 [ 336.657056] 39400000 e87f0838 f95f0838 7c0004ac <7d401828> 314affff 7d40192d 40c2fff4
 [ 336.663997] -[ end trace 4640cf8d4945ad95 ]-

So flag when the thread is stopped by setting the thread pointer to NULL.
Signed-off-by: NDave Carroll <david.carroll@microsemi.com>
Reviewed-by: NRaghava Aditya Renukunta <raghavaaditya.renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

1c6b41fb

04 1月, 2018 17 次提交

scsi: aacraid: Remove unused rescan variable · 75be67cd

由 Raghava Aditya Renukunta 提交于 12月 26, 2017

Remove unused rescan variable.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

75be67cd

scsi: aacraid: Skip schedule rescan in case of kdump · fe523759

由 Raghava Aditya Renukunta 提交于 12月 26, 2017

There is a chance of the driver to be stuck in kdump if drives start
acting up in kdump discovery process and the kernel decides to send eh
resets, which would prompt rescan to be scheduled.

Do not perform a rescan in kdump context, since we do not expect a hotplug
event during kdump and all the devices are going to go away anyway.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

fe523759

scsi: aacraid: Fix hang while scanning in eh recovery · 8a30e50b

由 Raghava Aditya Renukunta 提交于 12月 26, 2017

Add back the ability to scan for hotplug changes while eh was in progress.

Schedule a rescan for a later time in the eh recovery code and wait for
eh to complete in the rescan worker.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8a30e50b

scsi: aacraid: Reschedule host scan in case of failure · a1367e4a

由 Raghava Aditya Renukunta 提交于 12月 26, 2017

If the driver fails to retrieve information from the fw (could happen when
the fw is not fully in its senses), the driver does nothing and change is
not processed correctly by the driver

Schedule host rescan in case of failure. This is only for SAFW, since
the information retrieval failure will happen on SAFW devices.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a1367e4a

scsi: aacraid: Use hotplug handling function in place of scsi_scan_host · 8ebaa67f

由 Raghava Aditya Renukunta 提交于 12月 26, 2017

Driver uses scsi_scan_host to add new devices in the driver init path,
which adds all the fw exposed devices. The drivers resorts to queue
command checks to block out commands to _hidden_ devices.

Use the hotplug handler code to add new devices during driver init and
other areas, this is only for safw. For ARC scsi_scan_host will still
apply.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8ebaa67f

scsi: aacraid: Block concurrent hotplug event handling · 3395614e

由 Raghava Aditya Renukunta 提交于 12月 26, 2017

Currently driver will attempt to process hotplug events concurrently based
on the FW interrupt.

Protect safw update function with a scan mutex.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3395614e

scsi: aacraid: Merge adapter setup with resolve luns · 6f44a22b

由 Raghava Aditya Renukunta 提交于 12月 26, 2017

The device hotplug events are processed only after retrieving the updated
lun information from the fw. Does not make sense to keep them separate.

Merge both the hotplug handling and safw adapter setup code into single
function.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

6f44a22b

scsi: aacraid: Refactor resolve luns code and scsi functions · 3031c656

由 Raghava Aditya Renukunta 提交于 12月 26, 2017

Resolve luns checks the if a sdev is already present in the os to figure
out if it needs to be removed. Internally the driver exposes HBA on bus
2 even though its bus 1 in the fw. Its mildly confusing.

Refactor out the sdev lookup into its function to check if sdev has been
added to the kernel or not. Add helper functions to add, remove and put
devices based on their fw bus and target number.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3031c656

scsi: aacraid: Added macros to help loop through known buses and targets · 2290678f

由 Raghava Aditya Renukunta 提交于 12月 26, 2017

Added macros to loop through the MAX SUPPORTED Buses and Targets. This
will make the code a bit easier to read.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

2290678f

scsi: aacraid: Process hba and container hot plug events in single function · f2d2caba

由 Raghava Aditya Renukunta 提交于 12月 26, 2017

The hotplug handler code is duplicated for hba handling and container
handling.

Merged function to handle hba and container hot plug events into the
resolve luns functions. Added a bunch of helper functions to check the
validity of a given target and to check if bus, target is container
device.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

f2d2caba

scsi: aacraid: Merge func to get container information · 1d1fec53

由 Raghava Aditya Renukunta 提交于 12月 26, 2017

Merge aac_get_containers to setup target function, so that information
about all the present devices can be retrieved in one shot.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

1d1fec53

scsi: aacraid: Add target setup helper function · fc0fdd9a

由 Raghava Aditya Renukunta 提交于 12月 26, 2017

Add helper function to setup targets devices and create the base for the
upcoming patches
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

fc0fdd9a

scsi: aacraid: Refactor and rename to make mirror existing changes · b5a475e9

由 Raghava Aditya Renukunta 提交于 12月 26, 2017

Rename variables and functions to make bmic identify, report phy luns
to make them consistent across code internal existing code bases
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

b5a475e9

scsi: aacraid: Change phy luns function to use common bmic function · 5480aa18

由 Raghava Aditya Renukunta 提交于 12月 26, 2017

Edit function that retrieves phy lun information to use common
bmic function
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

5480aa18

scsi: aacraid: Move code to wait for IO completion to shutdown func · 216ced02

由 Raghava Aditya Renukunta 提交于 12月 26, 2017

Ideally driver needs to wait for IO to be submitted or responded to before
shutdown.

Move code to wait for IO completion into shutdown path
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

216ced02

scsi: aacraid: Do not remove offlined devices · 95900629

由 Raghava Aditya Renukunta 提交于 12月 26, 2017

As part of the recovery process, the drivers removes offline devices (
done by the kernel) and then tries to add them back in the rescan code.
Removing the device is like taking a sledgehammer to a nail.

Set the device as running if it is marked offline.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

95900629

scsi: aacraid: Fix hang in kdump · c5313ae8

由 Raghava Aditya Renukunta 提交于 12月 26, 2017

Driver attempts to perform a device scan and device add after coming out
of reset. At times when the kdump kernel loads and it tries to perform
eh recovery, the device scan hangs since its commands are blocked because
of the eh recovery. This should have shown up in normal eh recovery path
(Should have been obvious)

Remove the code that performs scanning.I can live without the rescanning
support in the stable kernels but a hanging kdump/eh recovery needs to be
fixed.

Fixes: a2d0321d (scsi: aacraid: Reload offlined drives after controller reset)
Cc: <stable@vger.kernel.org>
Reported-by: NDouglas Miller <dougmill@linux.vnet.ibm.com>
Tested-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Fixes: a2d0321d (scsi: aacraid: Reload offlined drives after controller reset)
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c5313ae8

29 11月, 2017 1 次提交

scsi: aacraid: address UBSAN warning regression · d1853975

由 Arnd Bergmann 提交于 11月 28, 2017

As reported by Meelis Roos, my previous patch causes an incorrect
calculation of the timeout, through an undefined signed integer
overflow:

[   12.228155] UBSAN: Undefined behaviour in drivers/scsi/aacraid/commsup.c:2514:49
[   12.228229] signed integer overflow:
[   12.228283] 964297611 * 250 cannot be represented in type 'long int'

The problem is that doing a multiplication with HZ first and then
dividing by USEC_PER_SEC worked correctly for 32-bit microseconds,
but not for 32-bit nanoseconds, which would require up to 41 bits.

This reworks the calculation to first convert the nanoseconds into
jiffies, which should give us the same result as before and not overflow.

Unfortunately I did not understand the exact intention of the algorithm,
in particular the part where we add half a second, so it's possible that
there is still a preexisting problem in this function. I added a comment
that this would be handled more nicely using usleep_range(), which
generally works better for waking up at a particular time than the
current schedule_timeout() based implementation. I did not feel
comfortable trying to implement that without being sure what the
intent is here though.

Fixes: 820f1886 ("scsi: aacraid: use timespec64 instead of timeval")
Tested-by: NMeelis Roos <mroos@linux.ee>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d1853975

21 11月, 2017 2 次提交

scsi: aacraid: Prevent crash in case of free interrupt during scsi EH path · e4717292

由 Guilherme G. Piccoli 提交于 11月 17, 2017

As part of the scsi EH path, aacraid performs a reinitialization of the
adapter, which encompass freeing resources and IRQs, NULLifying lots of
pointers, and then initialize it all over again.  We've identified a
problem during the free IRQ portion of this path if CONFIG_DEBUG_SHIRQ
is enabled on kernel config file.

Happens that, in case this flag was set, right after free_irq()
effectively clears the interrupt, it checks if it was requested as
IRQF_SHARED. In positive case, it performs another call to the IRQ
handler on driver. Problem is: since aacraid currently free some
resources *before* freeing the IRQ, once free_irq() path calls the
handler again (due to CONFIG_DEBUG_SHIRQ), aacraid crashes due to NULL
pointer dereference with the following trace:

  aac_src_intr_message+0xf8/0x740 [aacraid]
  __free_irq+0x33c/0x4a0
  free_irq+0x78/0xb0
  aac_free_irq+0x13c/0x150 [aacraid]
  aac_reset_adapter+0x2e8/0x970 [aacraid]
  aac_eh_reset+0x3a8/0x5d0 [aacraid]
  scsi_try_host_reset+0x74/0x180
  scsi_eh_ready_devs+0xc70/0x1510
  scsi_error_handler+0x624/0xa20

This patch prevents the crash by changing the order of the
deinitialization in this path of aacraid: first we clear the IRQ, then
we free other resources. No functional change intended.
Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

e4717292

scsi: aacraid: Check for PCI state of device in a generic way · bd257b2f

由 Guilherme G. Piccoli 提交于 11月 17, 2017

Commit 16ae9dd3 ("scsi: aacraid: Fix for excessive prints on EEH")
introduced checks about the state of device before any PCI operations in
the driver. Basically, this prevents it to perform PCI accesses when
device is in the process of recover from a PCI error. In PowerPC, such
mechanism is called EEH, and the aforementioned commit introduced checks
that are based on EEH-specific primitives for that.

The potential problems with this approach are three: first, these checks
are "locked" to powerpc only - another archs could have error recovery
methods too, like AER in Intel. Also, the powerpc primitives perform
expensive FW accesses to validate the precise PCI state of a device.
Finally, code becomes more complicated and needs ifdef validation based
on arch config being set.

So, this patch makes use of generic PCI state checks, which are
lightweight and non-dependent of arch configs - also, it makes the code
cleaner.

Fixes: 16ae9dd3 ("scsi: aacraid: Fix for excessive prints on EEH")
Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: NDave Carroll <david.carroll@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

bd257b2f

09 11月, 2017 1 次提交

scsi: aacraid: use timespec64 instead of timeval · 820f1886

由 Arnd Bergmann 提交于 11月 07, 2017

aacraid passes the current time to the firmware in one of two ways,
either as year/month/day/... or as 32-bit unsigned seconds.

The first one is broken on 32-bit architectures as it cannot go past
year 2038. Using timespec64 here makes it behave properly on both 32-bit
and 64-bit architectures, and avoids relying on signed integer overflow
to pass times into the second interface.

The interface used in aac_send_hosttime() however is still problematic
in year 2106 when 32-bit seconds overflow. Hopefully we don't have to
worry about aacraid by that time.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NDave Carroll <david.carroll@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

820f1886

08 8月, 2017 2 次提交

scsi: aacraid: add fib flag to mark scsi command callback · c323eab7

由 Hannes Reinecke 提交于 6月 30, 2017

To correctly identify which fib has a scsi command callback this
patch implements a flag FIB_CONTEXT_FLAG_SCSI_CMD.
Signed-off-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NRaghava Aditya Renukunta  <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c323eab7

scsi: aacraid: enable sending of TMFs from aac_hba_send() · b60710ec

由 Hannes Reinecke 提交于 6月 30, 2017

aac_hba_send() will return FAILED for any non-SCSI command requests,
failing any TMFs. This patch updates the check to allow TMFs.
Signed-off-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

b60710ec

13 6月, 2017 5 次提交

scsi: aacraid: Remove reference to Series-9 · 395e5df7

由 Raghava Aditya Renukunta 提交于 5月 10, 2017

Remove reference to Series-9 HBA and created arc ctrl check function.
Signed-off-by: NPrasad B Munirathnam <prasad.munirathnam@microsemi.com>
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: NDavid Carroll <david.carroll@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

395e5df7

scsi: aacraid: Make sure ioctl returns on controller reset · 8c41b9b7

由 Raghava Aditya Renukunta 提交于 5月 10, 2017

Made sure that ioctl commands return in case of a controller reset.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: NDave Carroll <david.carroll@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8c41b9b7

scsi: aacraid: Use correct function to get ctrl health · 9473ddb2

由 Raghava Aditya Renukunta 提交于 5月 10, 2017

The command thread checks the ctrl health periodically before sending
updates to the controller. The function that it uses is aac_check_health
which does more than get the health status.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: NDavid Carroll <david.carroll@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

9473ddb2

scsi: aacraid: Remove reset support from check_health · fed82007

由 Raghava Aditya Renukunta 提交于 5月 10, 2017

Check health does not need to reset the ctrl but just return the
controller health status.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: NDavid Carroll <david.carroll@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

fed82007

scsi: aacraid: Fix DMAR issues with iommu=pt · 8105d39d

由 Raghava Aditya Renukunta 提交于 5月 10, 2017

The driver changed the DMA consistent map after consistent memory was
allocated, this invalidated the IOMMU identity mapping. The fix was to
make sure that we set the DMA consistent mask setting once depending on
the controller card.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: NDave Carroll <david.carroll@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8105d39d

27 4月, 2017 1 次提交

scsi: aacraid: pci_alloc_consistent() failures on ARM64 · f481973d

由 Mahesh Rajashekhara 提交于 4月 05, 2017

There were pci_alloc_consistent() failures on ARM64 platform.  Use
dma_alloc_coherent() with GFP_KERNEL flag DMA memory allocations.
Signed-off-by: NMahesh Rajashekhara <mahesh.rajashekhara@microsemi.com>
[hch: tweaked indentation, removed memsets]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NDave Carroll <david.carroll@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

f481973d

12 4月, 2017 1 次提交

scsi: aacraid: fix PCI error recovery path · 911e572e

由 Guilherme G. Piccoli 提交于 4月 06, 2017

During a PCI error recovery, if aac_check_health() is not aware that a
PCI error happened and we have an offline PCI channel, it might trigger
some errors (like NULL pointer dereference) and inhibit the error
recovery process to complete.

This patch makes the health check procedure aware of PCI channel issues,
and in case of error recovery process, the function
aac_adapter_check_health() returns -1 and let the recovery process to
complete successfully. This patch was tested on upstream kernel
v4.11-rc5 in PowerPC ppc64le architecture with adapter 9005:028d
(VID:DID) - the error recovery procedure was able to recover fine.

Fixes: 5c63f7f7 ("aacraid: Added EEH support")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: NDave Carroll <david.carroll@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

911e572e

16 3月, 2017 1 次提交

scsi: aacraid: Fix potential null access · e498520e

由 Raghava Aditya Renukunta 提交于 3月 14, 2017

Currently, command threads fails to return ioctls commands for older
controller versions, since it returns when all the fibs have been
allocated. Another issue is even all the fibs have not been allocated,
the correct allocated fibs is not updated nor freed.

Fixes: 113156bc (scsi: aacraid: Reworked aac_command_thread)
Reported-by: NTomas Henzl <thenzl@redhat.com>
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: NDave Carroll <david.carroll@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

e498520e

28 2月, 2017 1 次提交

scsi: aacraid: remove redundant zero check on ret · fbdab3e7

由 Colin Ian King 提交于 2月 24, 2017

The check for ret being zero is redundant as a few statements earlier we
break out of the while loop if ret is non-zero. Thus we can remove the
zero check and also the dead-code non-zero case too.

Detected by CoverityScan, CID#1411632 ("Logically Dead Code")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Acked-by: NDave Carroll <david.carroll@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

fbdab3e7

24 2月, 2017 1 次提交

scsi: aacraid: Fixed expander hotplug for SMART family · a56e5740

由 Raghava Aditya Renukunta 提交于 2月 22, 2017

Current driver Hotplug processing code skips over Enclosure channel,
therefore any addition/removal of expander enclosure is not processed.
Additionally device addition code relies on older device type, which
prevents the hotplug of adapter expanders.

Fixed by removing code that skips over Enclosure channels and using the
latest device type for addition or removal or enclosure expanders.

Fixes: 6223a39f (scsi: aacraid: Added support for hotplug)
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: NDave Carroll <david.carroll@microsemi.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a56e5740

23 2月, 2017 6 次提交

scsi: aacraid: Fix a potential spinlock double unlock bug · d844752e

由 Raghava Aditya Renukunta 提交于 2月 16, 2017

The driver does not unlock the reply  queue spin lock after handling SMART
adapter events. Instead it might attempt to unlock an already unlocked
spin lock.

Fixed by making sure the driver locks the spin lock before freeing it.

Thank you dan for finding this issue out.

Fixes: 6223a39f (scsi: aacraid: Added support for hotplug)
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: NDavid Carroll <David.Carroll@microsemi.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d844752e

scsi: aacraid: Reload offlined drives after controller reset · a2d0321d

由 Raghava Aditya Renukunta 提交于 2月 16, 2017

During the IOP reset stress testing, it was found that the drives can be
marked offline when the adapter controller crashes and IO's are running
in parallel. When the controller  does come back from the reset, the drive
that is marked offline is not exposed.

Fixed by removing and adding drives that are marked offline. In addition
invoke a scsi host bus rescan to capture any additional configuration
changes.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: NDavid Carroll <David.Carroll@microsemi.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a2d0321d

scsi: aacraid: Skip wellness sync on controller failure · 849ac6a5

由 Raghava Aditya Renukunta 提交于 2月 16, 2017

aac_command_thread checks on the health of controller periodically,
using aac_check_health. If the status is an error state KERNEL_PANIC or
anything else. The driver will attempt to restart the adapter, but the
response is not checked in aac_command_thread. This allows the periodic
sync to go thru and lead the driver to a hung state.

Fixed by terminating the periodic loop(intended per original design),
if the controller is not restored to a healthy state.

Cc: stable@vger.kernel.org
Fixes: 3d77d840 (scsi: aacraid: Added support for periodic wellness sync)
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: NDavid Carroll <David.Carroll@microsemi.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

849ac6a5

scsi: aacraid: Fix memory leak in fib init path · 1bff5abc

由 Raghava Aditya Renukunta 提交于 2月 16, 2017

aac_fib_map_free frees misaligned fib dma memory, additionally it does not
free up the whole memory.

Fixed by changing the  code to free up the correct and full memory
allocation.

Cc: stable@vger.kernel.org
Fixes: e8b12f0f ([SCSI] aacraid: Add new code for PMC-Sierra's SRC based controller family)
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: NDavid Carroll <David.Carroll@microsemi.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

1bff5abc

scsi: aacraid: Prevent E3 lockup when deleting units · a0c6143e

由 Raghava Aditya Renukunta 提交于 2月 16, 2017

Arrconf management utility at times sends fibs with AdapterProcessed set
in its fibs. This causes the controller to panic and lockup.

Fixed by failing the commands that have AdapterProcessed set in its flag.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: NDavid Carroll <David.Carroll@microsemi.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a0c6143e

scsi: aacraid: Fix for excessive prints on EEH · 16ae9dd3

由 Raghava Aditya Renukunta 提交于 2月 16, 2017

This issue showed up on a kdump debug(single CPU on powerkvm), when EEH
errors rendered the adapter unusable. The driver correctly detected the
issue and attempted to restart the controller, in doing so the driver
attempted to read the status registers of the controller. This triggered
additional eeh errors which continued for a good 6 minutes.

Fixed by returning without waiting when EEH error is reported.
Signed-off-by: NRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: NDavid Carroll <David.Carroll@microsemi.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

16ae9dd3

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功