- 24 8月, 2016 19 次提交
-
-
由 Jitendra Bhivare 提交于
UE detection in health check is done in a work scheduled in global wq. UE caused due to transient parity errors are recoverable and reported within 1s. If this check for TPE gets delayed, PF0 might initiate soft-reset and then status of UE recoverable is lost. Handle UE detection in timer routine. Move out EQ delay update work from health check. Make the IOCTL for EQ delay update non-blocking as the completion status is ignored. Signed-off-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Jitendra Bhivare 提交于
Boot work involves: 1. Find and fetch configured boot session and its handle. 2. Attempt to open the session if its not. 3. Get the session details for boot kset creation. 4. Logout of that session owned by FW. 5. Create boot kset for session details. All these actions were done in blocking call with retries in global wq. Other works in wq suffered if the IOCTLs stalled or timed out. This change moves all the boot work to make it non-blocking. The work queued in global wq just issues the IOCTL depending on the action to be taken and mcc wq schedules work depending on status of the IOCTL. Initial boot_work is started on link and ASYNC event. The other code changes move all boot related functions in one place and follow naming conventions. Signed-off-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Jitendra Bhivare 提交于
Save ue_detected and fw_timeout errors in state field of beiscsi_hba. BEISCSI_HBA_RUNNING BEISCSI_HBA_LINK_UP BEISCSI_HBA_BOOT_FOUND BEISCSI_HBA_PCI_ERR BEISCSI_HBA_FW_TIMEOUT BEISCSI_HBA_IN_UE Make sure no PCI transaction happens once in error state. Add checks in IO path to detect HBA in error. Skip hwi_purge_eq step which can't be done in error state. Signed-off-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Jitendra Bhivare 提交于
todo_mcc_cq is not needed as only MCC work is queued. todo_cq is not used at all. Rename functions to be consistent. Signed-off-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Jitendra Bhivare 提交于
alloc_mcc_tag was replaced with alloc_mcc_wrb and is no more used. beiscsi_pci_soft_reset is not used at all and won't be needed. Signed-off-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Jitendra Bhivare 提交于
Redefine FW IP types. Before issuing IOCTL to clear IP, check if IP is all zeroes. All zeroes IP implies IP is not set in FW so FW fails that IOCTL. Signed-off-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Jitendra Bhivare 提交于
Wrong settings displayed for iface: iface.header_digest = 192.168.197.22 iface.data_digest = 255.255.255.0 iface.immediate_data = 192.168.197.1 Process ISCSI_NET_PARAM only in beiscsi_iface_get_param. Signed-off-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Jitendra Bhivare 提交于
Rename mgmt_get_if_info to be consistent with APIs name. Rename create/destroy APIs to indicate IFACE operations. Remove legacy be2iscsi and use beiscsi. Signed-off-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Jitendra Bhivare 提交于
Move mgmt_get_all_if_id before any set param operation. Rename mgmt_get_all_if_id to beiscsi_if_get_handle. Signed-off-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Jitendra Bhivare 提交于
VLAN tag is L2 construct, move VLAN code out from configuring IP. Rearrange and rename the APIs to make it consistent. Replace ENOSYS with EPERM. Signed-off-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Jitendra Bhivare 提交于
If BOOTPROTO is changed to static, the DHCP IP address should be released. All cases are being handled mgmt_set_ip and mgmt_static_ip_modify. Rearrange IFACE APIs to: beiscsi_if_clr_ip beiscsi_if_set_ip beiscsi_if_en_static beiscsi_if_en_dhcp This simplifies release of DHCP IP when BOOTPROTO is set to static. Signed-off-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Jitendra Bhivare 提交于
Gateway APIs assume IP type as IPv4. Modify it to be generic to allow clearing of IPv6 gateway set using BIOS. Signed-off-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Jitendra Bhivare 提交于
ipv4_iface and ipv6_iface fields need to be set to NULL when destroyed. Before creation these are checked. Use these to report correct states. Signed-off-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Jitendra Bhivare 提交于
Driver takes significant time to load 1m:20s and unload 40s. Checkpatch script threw warning: WARNING: msleep < 20ms can sleep for up to 20ms; see Documentation/timers/timers-howto.txt To eliminate this warning msleep(1) was replaced with msleep(20) before submitting. msleep(20) in init and uninit path for creation and destroying of number of WRBQs, CQs, and EQs is adding to load/unload time. Replace msleep with schedule_timeout_uninterruptible of 1ms as its enough in most cases. Signed-off-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Jitendra Bhivare 提交于
This got unnecessarily introduced with other changes in previous commits. mcc_lock is taken only in process contexts. Signed-off-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Jitendra Bhivare 提交于
Following configuration is created with what driver exports: iface.vlan_id = 65535 iface.vlan_priority = 255 iface.vlan_state = <empty> vlan_state is empty as iscsiadm doesn't process "Disabled". When applying this configuration, iscsiadm checks for if vlan_state is "disable" if not it enables with value in vlan_id. 65535 not being valid value, 0 is applied. Use "enable" or "disable" for ISCSI_NET_PARAM. Signed-off-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Johannes Thumshirn 提交于
Provide a translation table between Ethernet and FC port speeds so odd speeds (from a Ethernet POV) like 8 Gbit are correctly mapped to sysfs and open-fcoe's fcoeadm. Before: Description: BCM57840 NetXtreme II 10/20-Gigabit Ethernet Revision: 11 Manufacturer: Broadcom Corporation Serial Number: 6CC2173EA1D0 Driver: bnx2x 1.712.30-0 Number of Ports: 1 Symbolic Name: bnx2fc (QLogic BCM57840) v2.10.3 over eth2 OS Device Name: host1 Node Name: 0x20006cc2173ea1d1 Port Name: 0x10006cc2173ea1d1 FabricName: 0x100000c0dd0ce717 Speed: unknown Supported Speed: 1 Gbit, 10 Gbit MaxFrameSize: 2048 bytes FC-ID (Port ID): 0x660702 State: Online After: Description: BCM57840 NetXtreme II 10/20-Gigabit Ethernet Revision: 11 Manufacturer: Broadcom Corporation Serial Number: 6CC2173EA1D0 Driver: bnx2x 1.712.30-0 Number of Ports: 1 Symbolic Name: bnx2fc (QLogic BCM57840) v2.10.3 over eth2 OS Device Name: host1 Node Name: 0x20006cc2173ea1d1 Port Name: 0x10006cc2173ea1d1 FabricName: 0x100000c0dd0ce717 Speed: 8 Gbit Supported Speed: 1 Gbit, 10 Gbit MaxFrameSize: 2048 bytes FC-ID (Port ID): 0x660701 State: Online Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NHannes Reinicke <hare@suse.de> Reviewed-by: NLee Duncan <lduncan@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Matthew R. Ochs 提交于
The adapter file descriptor was previously cached within the kernel for a given context in order to support performing a close on behalf of an application. This is no longer needed as applications are now required to perform a close on the adapter file descriptor. Inspired-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Matthew R. Ochs 提交于
Caching the adapter file descriptor and performing a close on behalf of an application is a poor design. This is due to the fact that once a file descriptor in installed, it is free to be altered without the knowledge of the cxlflash driver. This can lead to inconsistencies between the application and kernel. Furthermore, the nature of the former design is more exploitable and thus should be abandoned. To support applications performing a close on the adapter file that is associated with a context, a new flag is introduced to the user API to indicate to applications that they are responsible for the close following the cleanup (detach) of a context. The documentation is also updated to reflect this change in behavior. Inspired-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 19 8月, 2016 7 次提交
-
-
由 Matthew R. Ochs 提交于
Currently, context user references are tracked via the list of LUNs that have attached to the context. While convenient, this is not intuitive without a deep study of the code and is inconsistent with the existing reference tracking patterns within the kernel. This design choice can lead to future bug injection. To improve code comprehension and better protect against future bugs, add explicit reference counting to contexts and migrate the context removal code to the kref release handler. Inspired-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Matthew R. Ochs 提交于
The context removal routine requires access to the owning adapter structure to reset the context within the AFU as part of the tear down sequence. In order to support kref adoption, the owning adapter must be accessible from the release handler. As the kref framework only provides the kref reference as the sole parameter, another means is needed to derive the owning adapter. As a remedy, the owning adapter reference is saved off within the context during initialization. Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Matthew R. Ochs 提交于
Context information structures are protected by a mutex that is held when accessing/manipulating the context. When the code that manages these structures was authored, a decision was made to include taking the mutex as part of the allocation/initialization sequence and also handle the scenario where the mutex was already held when freeing the context. While not a problem outright, this design decision has been deemed as too flexible and the code should be made more rigid to avoid future bugs. In addition, further review of the code yields that the existing mutex manipulations in both of these context management paths are superfluous. This commit removes the obtaining of the context mutex in the context initialization routine and assumes the mutex is not held in the context free path. Inspired-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: NManoj N. Kumar <manoj@linux.vnet.ibm.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Hannes Reinecke 提交于
When all exchanges are reset the upper layers have already logged out of the remote port, so the exchanges can be reset without sending any ABTS. Signed-off-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NChad Dupuis <chad.dupuis@qlogic.com> Tested-by: NChad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Hannes Reinecke 提交于
FC-LS mandates that we should invalidate all sequences before sending a LOGO. And we should set the event to RPORT_EV_STOP when a LOGO request has been received to signal that all exchanges are terminated. Signed-off-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NChad Dupuis <chad.dupuis@qlogic.com> Tested-by: NChad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Hannes Reinecke 提交于
When running in point-to-multipoint mode PLOGI is done after FLOGI completed. So when the PLOGI fails we should be sending a LOGO to the remote port. [mkp: Applied by hand] Signed-off-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NChad Dupuis <chad.dupuis@qlogic.com> Tested-by: NChad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Hannes Reinecke 提交于
When receiving a PRLO it just means that the operating parameters have changed, it does _not_ mean that the port doesn't want to communicate with us. So instead of implicitly logging out we should be issueing a PRLI to figure out the new operating parameters. We can always recover once PRLI fails. Signed-off-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NChad Dupuis <chad.dupuis@qlogic.com> Tested-by: NChad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 13 8月, 2016 1 次提交
-
-
由 Kevin Barnett 提交于
This initial commit contains Microsemi's smartpqi module. [mkp: Minor tweaks to apply to 4.9/scsi-queue] Reviewed-by: NScott Benesh <scott.benesh@microsemi.com> Reviewed-by: NKevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: NKevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: NDon Brace <don.brace@microsemi.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NEwan D. Milne <emilne@redhat.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 09 8月, 2016 6 次提交
-
-
由 Dan Carpenter 提交于
The "if (test_bit(UNLOADING..." line was indented one tab more than it should have been. There was an extra parenthesis around the qla2x00_reset_active() function call. I lined up the conditions a bit so that it shows how they group together. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Acked-by: NHimanshu Madhani <himanshu.madhani@qlogic.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Johannes Thumshirn 提交于
In _scsih_io_done() we test if the ioc->logging_level does _not_ have the MPT_DEBUG_REPLY bit set and if it hasn't we print the debug messages. This unfortunately is the wrong way around. Note, the actual bug is older than af009411 but this commit removed the CONFIG_SCSI_MPT3SAS_LOGGING Kconfig option which hid the bug. Fixes: af009411 'mpt2sas, mpt3sas: Remove SCSI_MPTXSAS_LOGGING entry from Kconfig' Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Acked-by: NChaitra P B <chaitra.basappa@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Calvin Owens 提交于
Trivial non-functional changes for a couple annoying things: 1) Functions local to files are not declared static, which is frustrating when reading the code because it's non-obvious at first glance what's actually called from other files. 2) Set-but-unused variables abound, presumably to mask -Wunused-result errors in the past. None of these are flagged today though (with one exception noted below), so remove them. Fixing (2) exposed the fact that we improperly ignore the return value of scsi_device_reprobe() in _scsih_reprobe_lun(). Fixing the calling code to deal with the potential error is non-trivial, so for now just WARN(). Signed-off-by: NCalvin Owens <calvinowens@fb.com> Acked-by: NChaitra P B <chaitra.basappa@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Calvin Owens 提交于
With the exception of a single call to wait_for_doorbell_int(), all this conditional sleeping code is dead. So delete it. Signed-off-by: NCalvin Owens <calvinowens@fb.com> Acked-by: NChaitra P B <chaitra.basappa@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Calvin Owens 提交于
This flag that conditionally acquires the mutex is confusing and prone to bugginess: refactor it into two separate function calls, and make the unlocked one complain if it's called outside the mutex. Signed-off-by: NCalvin Owens <calvinowens@fb.com> Acked-by: NChaitra P B <chaitra.basappa@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Calvin Owens 提交于
We blindly trust the hardware to give us NUL-terminated strings, which is a bad idea because it doesn't always do that. For example: [ 481.184784] mpt3sas_cm0: enclosure level(0x0000), connector name( \x3) In this case, connector_name is four spaces. We got lucky here because the 2nd byte beyond our character array happens to be a NUL. Fix this by explicitly writing '\0' to the end of the string to ensure we don't run off the edge of the world in printk(). Signed-off-by: NCalvin Owens <calvinowens@fb.com> Acked-by: NChaitra P B <chaitra.basappa@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 02 8月, 2016 3 次提交
-
-
由 Wei Yongjun 提交于
Fix to return error code -ENOMEM from the workqueue alloc error handling case instead of 0, as done elsewhere in this function. Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com> Acked-by: NBrian King <brking@linux.vnet.ibm.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Wei Yongjun 提交于
Add the missing destroy_workqueue() before return from fcoe_init() in the fcoe transport register failed error handling case. Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com> Acked-by: NJohannes Thumshirn <jth@kernel.org> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Johannes Thumshirn 提交于
Check for the existence of piocb->vport before accessing it. Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Acked-by: NJames Smart <james.smart@broadcom.com> Reviewed-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 27 7月, 2016 4 次提交
-
-
由 Hannes Reinecke 提交于
FC-BB-6 states: FIP protocols shall be performed on a per-VLAN basis. It is recommended to use the FIP VLAN discovery protocol on the default VLAN. Signed-off-by: NHannes Reinecke <hare@suse.com> Acked-by: NJohannes Thumshirn <jth@kernel.org> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Brian King 提交于
When performing an async scan, make sure the kthread doing scanning doesn't start before the scsi host is fully initialized. Signed-off-by: NBrian King <brking@linux.vnet.ibm.com> Reviewed-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Uma Krishnan 提交于
If an EEH or some other hard error occurs while the adapter instance was being initialized, on the subsequent shutdown of the device, the system could crash with: [c000000f1da03b60] c0000000005eccfc pci_device_shutdown+0x6c/0x100 [c000000f1da03ba0] c0000000006d67d4 device_shutdown+0x1b4/0x2c0 [c000000f1da03c40] c0000000000ea30c kernel_restart_prepare+0x5c/0x80 [c000000f1da03c70] c0000000000ea48c kernel_restart+0x2c/0xc0 [c000000f1da03ce0] c0000000000ea970 SyS_reboot+0x1c0/0x2d0 [c000000f1da03e30] c000000000009204 system_call+0x38/0xb4 This crash is due to the AFU not being mapped when the shutdown notification routine is called and is a regression that was inserted recently with Commit 704c4b0d ("cxlflash: Shutdown notify support for CXL Flash cards"). As a fix, shutdown notification should only occur when the AFU is mapped. Fixes: 704c4b0d ("cxlflash: Shutdown notify support for CXL Flash cards") Signed-off-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com> Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
The lpfc_sli4_scmd_to_wqidx_distr() function expects the scsi_cmnd 'lpfc_cmd->pCmd' not to be null, and point to the midlayer command. That's not true in the .eh_(device|target|bus)_reset_handler path, because lpfc_send_taskmgmt() sends commands not from the midlayer, so does not set 'lpfc_cmd->pCmd'. That is true in the .queuecommand path because lpfc_queuecommand() stores the scsi_cmnd from midlayer in lpfc_cmd->pCmd; and lpfc_cmd is stored by lpfc_scsi_prep_cmnd() in piocbq->context1 -- which is passed to lpfc_sli4_scmd_to_wqidx_distr() as lpfc_cmd parameter. This problem can be hit on SCSI EH, and immediately with sg_reset. These 2 test-cases demonstrate the problem/fix with next-20160601. Test-case 1) sg_reset # strace sg_reset --device /dev/sdm <...> open("/dev/sdm", O_RDWR|O_NONBLOCK) = 3 ioctl(3, SG_SCSI_RESET, 0x3fffde6d0994 <unfinished ...> +++ killed by SIGSEGV +++ Segmentation fault # dmesg Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xd00000001c88442c Oops: Kernel access of bad area, sig: 11 [#1] <...> CPU: 104 PID: 16333 Comm: sg_reset Tainted: G W 4.7.0-rc1-next-20160601-00004-g95b89dc #6 <...> NIP [d00000001c88442c] lpfc_sli4_scmd_to_wqidx_distr+0xc/0xd0 [lpfc] LR [d00000001c826fe8] lpfc_sli_calc_ring.part.27+0x98/0xd0 [lpfc] Call Trace: [c000003c9ec876f0] [c000003c9ec87770] 0xc000003c9ec87770 (unreliable) [c000003c9ec87720] [d00000001c82e004] lpfc_sli_issue_iocb+0xd4/0x260 [lpfc] [c000003c9ec87780] [d00000001c831a3c] lpfc_sli_issue_iocb_wait+0x15c/0x5b0 [lpfc] [c000003c9ec87880] [d00000001c87f27c] lpfc_send_taskmgmt+0x24c/0x650 [lpfc] [c000003c9ec87950] [d00000001c87fd7c] lpfc_device_reset_handler+0x10c/0x200 [lpfc] [c000003c9ec87a10] [c000000000610694] scsi_try_bus_device_reset+0x44/0xc0 [c000003c9ec87a40] [c0000000006113e8] scsi_ioctl_reset+0x198/0x2c0 [c000003c9ec87bf0] [c00000000060fe5c] scsi_ioctl+0x13c/0x4b0 [c000003c9ec87c80] [c0000000006629b0] sd_ioctl+0xf0/0x120 [c000003c9ec87cd0] [c00000000046e4f8] blkdev_ioctl+0x248/0xb70 [c000003c9ec87d30] [c0000000002a1f60] block_ioctl+0x70/0x90 [c000003c9ec87d50] [c00000000026d334] do_vfs_ioctl+0xc4/0x890 [c000003c9ec87de0] [c00000000026db60] SyS_ioctl+0x60/0xc0 [c000003c9ec87e30] [c000000000009120] system_call+0x38/0x108 Instruction dump: <...> With fix: # strace sg_reset --device /dev/sdm <...> open("/dev/sdm", O_RDWR|O_NONBLOCK) = 3 ioctl(3, SG_SCSI_RESET, 0x3fffe103c554) = 0 close(3) = 0 exit_group(0) = ? +++ exited with 0 +++ # dmesg [ 424.658649] lpfc 0006:01:00.4: 4:(0):0713 SCSI layer issued Device Reset (1, 0) return x2002 Test-case 2) SCSI EH Using this debug patch to wire an SCSI EH trigger, for lpfc_scsi_cmd_iocb_cmpl(): - cmd->scsi_done(cmd); + if ((phba->pport ? phba->pport->cfg_log_verbose : phba->cfg_log_verbose) == 0x32100000) + printk(KERN_ALERT "lpfc: skip scsi_done()\n"); + else + cmd->scsi_done(cmd); # echo 0x32100000 > /sys/class/scsi_host/host11/lpfc_log_verbose # dd if=/dev/sdm of=/dev/null iflag=direct & <...> After a while: # dmesg lpfc 0006:01:00.4: 4:(0):3053 lpfc_log_verbose changed from 0 (x0) to 839909376 (x32100000) lpfc: skip scsi_done() <...> Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xd0000000199e448c Oops: Kernel access of bad area, sig: 11 [#1] <...> CPU: 96 PID: 28556 Comm: scsi_eh_11 Tainted: G W 4.7.0-rc1-next-20160601-00004-g95b89dc #6 <...> NIP [d0000000199e448c] lpfc_sli4_scmd_to_wqidx_distr+0xc/0xd0 [lpfc] LR [d000000019986fe8] lpfc_sli_calc_ring.part.27+0x98/0xd0 [lpfc] Call Trace: [c000000ff0d0b890] [c000000ff0d0b900] 0xc000000ff0d0b900 (unreliable) [c000000ff0d0b8c0] [d00000001998e004] lpfc_sli_issue_iocb+0xd4/0x260 [lpfc] [c000000ff0d0b920] [d000000019991a3c] lpfc_sli_issue_iocb_wait+0x15c/0x5b0 [lpfc] [c000000ff0d0ba20] [d0000000199df27c] lpfc_send_taskmgmt+0x24c/0x650 [lpfc] [c000000ff0d0baf0] [d0000000199dfd7c] lpfc_device_reset_handler+0x10c/0x200 [lpfc] [c000000ff0d0bbb0] [c000000000610694] scsi_try_bus_device_reset+0x44/0xc0 [c000000ff0d0bbe0] [c0000000006126cc] scsi_eh_ready_devs+0x49c/0x9c0 [c000000ff0d0bcb0] [c000000000614160] scsi_error_handler+0x580/0x680 [c000000ff0d0bd80] [c0000000000ae848] kthread+0x108/0x130 [c000000ff0d0be30] [c0000000000094a8] ret_from_kernel_thread+0x5c/0xb4 Instruction dump: <...> With fix: # dmesg lpfc 0006:01:00.4: 4:(0):3053 lpfc_log_verbose changed from 0 (x0) to 839909376 (x32100000) lpfc: skip scsi_done() <...> lpfc 0006:01:00.4: 4:(0):0713 SCSI layer issued Device Reset (0, 0) return x2002 <...> lpfc 0006:01:00.4: 4:(0):0723 SCSI layer issued Target Reset (1, 0) return x2002 <...> lpfc 0006:01:00.4: 4:(0):0714 SCSI layer issued Bus Reset Data: x2002 <...> lpfc 0006:01:00.4: 4:(0):3172 SCSI layer issued Host Reset Data: <...> Fixes: 8b0dff14 ("lpfc: Add support for using block multi-queue") Cc: <stable@vger.kernel.org> # v4.2+ Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Acked-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-