- 03 10月, 2011 2 次提交
-
-
由 Mark Salyzyn 提交于
When a wide port is being utilized to a target, if one disables only one of the phys, we get an OS crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000238 IP: [<ffffffff814ca9b1>] mutex_lock+0x21/0x50 PGD 4103f5067 PUD 41dba9067 PMD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/bus/pci/slots/5/address CPU 0 Modules linked in: pm8001(U) ses enclosure fuse nfsd exportfs autofs4 ipmi_devintf ipmi_si ipmi_msghandler nfs lockd fscache nfs_acl auth_rpcgss 8021q fcoe libfcoe garp libfc scsi_transport_fc stp scsi_tgt llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 sr_mod cdrom dm_mirror dm_region_hash dm_log uinput sg i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000e mlx4_ib ib_mad ib_core mlx4_en mlx4_core ext3 jbd mbcache sd_mod crc_t10dif usb_storage ata_generic pata_acpi ata_piix libsas(U) scsi_transport_sas dm_mod [last unloaded: pm8001] Modules linked in: pm8001(U) ses enclosure fuse nfsd exportfs autofs4 ipmi_devintf ipmi_si ipmi_msghandler nfs lockd fscache nfs_acl auth_rpcgss 8021q fcoe libfcoe garp libfc scsi_transport_fc stp scsi_tgt llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 sr_mod cdrom dm_mirror dm_region_hash dm_log uinput sg i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000e mlx4_ib ib_mad ib_core mlx4_en mlx4_core ext3 jbd mbcache sd_mod crc_t10dif usb_storage ata_generic pata_acpi ata_piix libsas(U) scsi_transport_sas dm_mod [last unloaded: pm8001] Pid: 5146, comm: scsi_wq_5 Not tainted 2.6.32-71.29.1.el6.lustre.7.x86_64 #1 Storage Server RIP: 0010:[<ffffffff814ca9b1>] [<ffffffff814ca9b1>] mutex_lock+0x21/0x50 RSP: 0018:ffff8803e4e33d30 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000238 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff8803e664c800 RDI: 0000000000000238 RBP: ffff8803e4e33d40 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000238 R14: ffff88041acb7200 R15: ffff88041c51ada0 FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000238 CR3: 0000000410143000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process scsi_wq_5 (pid: 5146, threadinfo ffff8803e4e32000, task ffff8803e4e294a0) Stack: ffff8803e664c800 0000000000000000 ffff8803e4e33d70 ffffffffa001f06e <0> ffff8803e4e33d60 ffff88041c51ada0 ffff88041acb7200 ffff88041bc0aa00 <0> ffff8803e4e33d90 ffffffffa0032b6c 0000000000000014 ffff88041acb7200 Call Trace: [<ffffffffa001f06e>] sas_port_delete_phy+0x2e/0xa0 [scsi_transport_sas] [<ffffffffa0032b6c>] sas_unregister_devs_sas_addr+0xac/0xe0 [libsas] [<ffffffffa0034914>] sas_ex_revalidate_domain+0x204/0x330 [libsas] [<ffffffffa00307f0>] ? sas_revalidate_domain+0x0/0x90 [libsas] [<ffffffffa0030855>] sas_revalidate_domain+0x65/0x90 [libsas] [<ffffffff8108c7d0>] worker_thread+0x170/0x2a0 [<ffffffff81091ea0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8108c660>] ? worker_thread+0x0/0x2a0 [<ffffffff81091b36>] kthread+0x96/0xa0 [<ffffffff810141ca>] child_rip+0xa/0x20 [<ffffffff81091aa0>] ? kthread+0x0/0xa0 [<ffffffff810141c0>] ? child_rip+0x0/0x20 Code: ff ff 85 c0 75 ed eb d6 66 90 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 0f 1f 44 00 00 48 89 fb e8 92 f4 ff ff 48 89 df <f0> ff 0f 79 05 e8 25 00 00 00 65 48 8b 04 25 08 cc 00 00 48 2d RIP [<ffffffff814ca9b1>] mutex_lock+0x21/0x50 RSP <ffff8803e4e33d30> CR2: 0000000000000238 The following patch is admittedly a band-aid, and does not solve the root cause, but it still is a good candidate for hardening as a pointer check before reference. Signed-off-by: NMark Salyzyn <mark_salyzyn@us.xyratex.com> Tested-by: NJack Wang <jack_wang@usish.com> Cc: stable@kernel.org Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Roland Dreier 提交于
I hit a crash in qla2x00_abort_all_cmds() if the qla2xxx module is unloaded right after it is loaded. I debugged this down to the abort handling improperly treating a command of type SRB_ADISC_CMD as if it had a bsg_job to complete when that command actually uses the iocb_cmd part of the union. (I guess to hit this one has to unload the module while the async FC initialization is still in progress) It seems we should only look for a bsg_job if type is SRB_ELS_CMD_RPT, SRB_ELS_CMD_HST or SRB_CT_CMD, so switch the test to make that explicit. Signed-off-by: NRoland Dreier <roland@purestorage.com> Acked-by: NChad Dupuis <chad.dupuis@qlogic.com> Cc: stable@kernel.org Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 26 9月, 2011 2 次提交
-
-
由 James Bottomley 提交于
Following reports on the list, it looks like the 3e-9xxx driver will leak dma mappings every time we get a transient queueing error back from the card. This is because it maps the sg list in the routine that sends the command, but doesn't unmap again in the transient failure path (even though the command is sent back to the block layer). Fix by unmapping before returning the status. Reported-by: NChris Boot <bootc@bootc.net> Tested-by: NChris Boot <bootc@bootc.net> Acked-by: NAdam Radford <aradford@gmail.com> Cc: stable@kernel.org Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Neil Horman 提交于
This oops was reported recently: d:mon> e cpu 0xd: Vector: 300 (Data Access) at [c0000000fd4c7120] pc: d00000000076f194: .t3_l2t_get+0x44/0x524 [cxgb3] lr: d000000000b02108: .init_act_open+0x150/0x3d4 [cxgb3i] sp: c0000000fd4c73a0 msr: 8000000000009032 dar: 0 dsisr: 40000000 current = 0xc0000000fd640d40 paca = 0xc00000000054ff80 pid = 5085, comm = iscsid d:mon> t [c0000000fd4c7450] d000000000b02108 .init_act_open+0x150/0x3d4 [cxgb3i] [c0000000fd4c7500] d000000000e45378 .cxgbi_ep_connect+0x784/0x8e8 [libcxgbi] [c0000000fd4c7650] d000000000db33f0 .iscsi_if_rx+0x71c/0xb18 [scsi_transport_iscsi2] [c0000000fd4c7740] c000000000370c9c .netlink_data_ready+0x40/0xa4 [c0000000fd4c77c0] c00000000036f010 .netlink_sendskb+0x4c/0x9c [c0000000fd4c7850] c000000000370c18 .netlink_sendmsg+0x358/0x39c [c0000000fd4c7950] c00000000033be24 .sock_sendmsg+0x114/0x1b8 [c0000000fd4c7b50] c00000000033d208 .sys_sendmsg+0x218/0x2ac [c0000000fd4c7d70] c00000000033f55c .sys_socketcall+0x228/0x27c [c0000000fd4c7e30] c0000000000086a4 syscall_exit+0x0/0x40 --- Exception: c01 (System Call) at 00000080da560cfc The root cause was an EEH error, which sent us down the offload_close path in the cxgb3 driver, which in turn sets cdev->l2opt to NULL, without regard for upper layer driver (like the cxgbi drivers) which might have execution contexts in the middle of its use. The result is the oops above, when t3_l2t_get attempts to dereference L2DATA(cdev)->nentries in arp_hash right after the EEH error handler sets it to NULL. The fix is to prevent the setting of the NULL pointer until after there are no further users of it. The t3cdev->l2opt pointer is now converted to be an rcu pointer and the L2DATA macro is now called under the protection of the rcu_read_lock(). When the EEH error path: t3_adapter_error->offload_close->cxgb3_offload_deactivate Is exectured, setting of that l2opt pointer to NULL, is now gated on an rcu quiescence point, preventing, allowing L2DATA callers to safely check for a NULL pointer without concern that the underlying data will be freeded before the pointer is dereferenced. This has been tested by the reporter and shown to fix the reproted oops [nhorman: fix up unitinialised variable reported by Dan Carpenter] Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Reviewed-by: NKaren Xie <kxie@chelsio.com> Cc: stable@kernel.org Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 24 9月, 2011 2 次提交
-
-
由 Randy Dunlap 提交于
sector_t can be different types, so cast it to its largest possible type. drivers/scsi/qla2xxx/qla_isr.c:1509:5: warning: format '%lx' expects type 'long unsigned int', but argument 5 has type 'sector_t' Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Randy Dunlap 提交于
SCSI_ISCI needs to select SCSI_SAS_HOST_SMP to ensure that all needed symbols are available to it. Fixes this build error: ERROR: "try_test_sas_gpio_gp_bit" [drivers/scsi/isci/isci.ko] undefined! Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 9月, 2011 3 次提交
-
-
由 Randy Dunlap 提交于
qla4xxx driver needs to be linked with libiscsi.o to fix build errors. This happens when no other drivers that use libiscsi.o are enabled. ERROR: "iscsi_conn_stop" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined! ERROR: "iscsi_conn_get_addr_param" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined! ERROR: "iscsi_session_teardown" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined! ERROR: "iscsi_host_alloc" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined! ERROR: "iscsi_conn_start" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined! ERROR: "iscsi_conn_send_pdu" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined! ERROR: "iscsi_session_get_param" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined! ERROR: "iscsi_conn_get_param" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined! ERROR: "iscsi_set_param" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined! ERROR: "iscsi_session_failure" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined! ERROR: "iscsi_complete_pdu" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined! ERROR: "iscsi_session_setup" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined! ERROR: "iscsi_conn_bind" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined! ERROR: "iscsi_conn_setup" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined! ERROR: "iscsi_itt_to_task" [drivers/scsi/qla4xxx/qla4xxx.ko] undefined! Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net> Reviewed-by: NMike Christie <michaelc@cs.wisc.edu> Cc: stable@kernel.org Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Mark Salyzyn 提交于
In an enclosure model where there are chaining expanders to a large body of storage, it was discovered that libsas, responding to a broadcast event change, would only revalidate the domain of first child expander in the list. The issue is that the pointer value to the discovered source device was used to break out of the loop, rather than the content of the pointer. This still remains non-compliant as the revalidate domain code is supposed to loop through all child expanders, and not stop at the first one it finds that reports a change count. However, the design of this routine does not allow multiple device discoveries and that would be a more complicated set of patches reserved for another day. We are fixing the glaring bug rather than refactoring the code. Signed-off-by: NMark Salyzyn <msalyzyn@us.xyratex.com> Cc: stable@kernel.org Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Vasily Averin 提交于
scsi reset on hardware with enabled MSI interrupts generates WARNING message [11027.798722] aacraid: Host adapter abort request (0,0,0,0) [11027.798814] aacraid: Host adapter reset request. SCSI hang ? [11087.762237] aacraid: SCSI bus appears hung [11135.082543] ------------[ cut here ]------------ [11135.082646] WARNING: at drivers/pci/msi.c:658 pci_enable_msi_block+0x251/0x290() Signed-off-by: NVasily Averin <vvs@sw.ru> Acked-by: NMark Salyzyn <mark_salyzyn@us.xyratex.com> Cc: stable@kernel.org Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 11 9月, 2011 1 次提交
-
-
由 Randy Dunlap 提交于
When CONFIG_NET is disabled, SCSI_QLA_ISCSI selects SCSI_ISCSI_ATTRS, which uses network interfaces, so the build fails with multiple errors: warning: (ISCSI_TCP && SCSI_CXGB3_ISCSI && SCSI_CXGB4_ISCSI && SCSI_QLA_ISCSI && INFINIBAND_ISER) selects SCSI_ISCSI_ATTRS which has unmet direct dependencies (SCSI && NET) ERROR: "skb_trim" [drivers/scsi/scsi_transport_iscsi.ko] undefined! ERROR: "netlink_kernel_create" [drivers/scsi/scsi_transport_iscsi.ko] undefined! ERROR: "netlink_kernel_release" [drivers/scsi/scsi_transport_iscsi.ko] undefined! ... so make SCSI_QLA_ISCSI also depend on NET to prevent the build errors. Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net> Cc: Ravi Anand <ravi.anand@qlogic.com> Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Cc: iscsi-driver@qlogic.com Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 8月, 2011 5 次提交
-
-
由 Eddie Wai 提交于
The iscsi_nopout task's TTT is defined as __be32 while the DMA memory to the chip is CPU specific. This creates a problem for unsolicited NOP-In responses where the TTT is not the RESERVED tag of 0xFFs. This patch adds a call to be32_to_cpu for the TTT specified. Signed-off-by: NEddie Wai <eddie.wai@broadcom.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Yi Zou 提交于
In commit 6a716a85, while releasing the DDP context in case frame_send() failed, the frame may already be freed, so we should store the pointer to fc_fcp_pkt and release the DDP context using the locally stored fsp instead of getting fsp from the fr_fsp(fp) on a frame. Signed-off-by: NYi Zou <yi.zou@intel.com> Reported-by: NBhanu Prakash Gollapudi <bprakash@broadcom.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Vasu Dev 提交于
Call fc_block_scsi_eh() in all fcoe eh to blocks the scsi_eh thread for blocked rports. Signed-off-by: NVasu Dev <vasu.dev@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Reviewed-by: NYi Zou <yi.zou@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Vasu Dev 提交于
Current fc_eh_host_reset leaves lport offline permanently due to FLOGI response getting handled by LOGO response from last reset as both had same exchange id. So fix this by having end to end exches clean-up using exchange abort along exches reset done from fc_eh_host_reset. This would avoid exchanges collision between the sessions across the reset. In this case implicit login should have done that but no aborting support for FIP frames, so just wait till lport->r_a_tov before restarting next flogi to ensure all exchanges are good to use again for next session. Below is the trace of LOGO from older session coming ahead of FLOGI response with same exche id 0x203:- 617 86.435165 4e.00.0b -> ff.ff.fc FC ELS LOGO 0x203 618 86.435195 4e.00.0b -> b6.02.00 FC ELS LOGO 0x213 619 86.435220 4e.00.0b -> 18.03.00 FC ELS LOGO 0x223 620 86.435244 4e.00.0b -> 18.02.00 FC ELS LOGO 0x233 621 86.435267 4e.00.0b -> 18.01.00 FC ELS LOGO 0x243 622 86.435349 00.00.00 -> ff.ff.fe FC ELS FLOGI 0x203 623 86.435549 ff.ff.fc -> 4e.00.0b FC ELS ACC (LOGO) 0x203 624 86.438721 ff.ff.fe -> 4e.00.0b FC ELS ACC (FLOGI) 0x203 625 86.442059 18.03.00 -> 4e.00.0b FC ELS ACC (LOGO) 0x223 626 86.443683 b6.02.00 -> 4e.00.0b FC ELS ACC (LOGO) 0x213 627 86.447693 18.01.00 -> 4e.00.0b FC ELS ACC (LOGO) 0x243 628 86.453499 18.02.00 -> 4e.00.0b FC ELS ACC (LOGO) 0x233 Signed-off-by: NVasu Dev <vasu.dev@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Reviewed-by: NYi Zou <yi.zou@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Robert Love 提交于
The rtnl cannot be held durrng the fcoe_interface_put. If it is the last reference on the fcoe_interface the fcoe_ctlr_destroy will be called as a part of the cleanup, ultimately calling cancel_work_sync(&fip->recv_work); If we are processing a flogi response we will be in the recv_work context and we will lock the rtnl to add a new unicast MAC address. This is how the deadlock can occur. The fix is simply to move the rtnl_lock/unlock into fcoe_interface_cleanup so that it can be unlocked before fcoe_interface_put is called. Here is the lockdep report: Jul 21 11:26:35 bubba [ 223.870702] ul 21 11:26:35 bubba [ 223.870704] ======================================================= Jul 21 11:26:35 bubba [ 223.871255] [ INFO: possible circular locking dependency detected ] Jul 21 11:26:35 bubba [ 223.871530] 3.0.0-rc7+ #1 Jul 21 11:26:35 bubba [ 223.871797] ------------------------------------------------------- Jul 21 11:26:35 bubba [ 223.872072] lockdeptest.sh/3464 is trying to acquire lock: Jul 21 11:26:35 bubba [ 223.872345] ((&fip->recv_work) Jul 21 11:26:35 bubba ){+.+.+.} Jul 21 11:26:35 bubba , at: Jul 21 11:26:35 bubba [<ffffffff810531f1>] wait_on_work+0x0/0xbd Jul 21 11:26:35 bubba [ 223.873022] Jul 21 11:26:35 bubba [ 223.873023] but task is already holding lock: Jul 21 11:26:35 bubba [ 223.873555] (rtnl_mutex Jul 21 11:26:35 bubba ){+.+.+.} Jul 21 11:26:35 bubba , at: Jul 21 11:26:35 bubba [<ffffffff813e8233>] rtnl_lock+0x12/0x14 Jul 21 11:26:35 bubba [ 223.874229] Jul 21 11:26:35 bubba [ 223.874230] which lock already depends on the new lock. Jul 21 11:26:35 bubba [ 223.874231] Jul 21 11:26:35 bubba [ 223.875032] Jul 21 11:26:35 bubba [ 223.875033] the existing dependency chain (in reverse order) is: Jul 21 11:26:35 bubba [ 223.875573] Jul 21 11:26:35 bubba [ 223.875573] -> #1 Jul 21 11:26:35 bubba (rtnl_mutex Jul 21 11:26:35 bubba ){+.+.+.} Jul 21 11:26:35 bubba : Jul 21 11:26:35 bubba [ 223.876301] Jul 21 11:26:35 bubba [<ffffffff8106c14a>] lock_acquire+0xd2/0xf7 Jul 21 11:26:35 bubba [ 223.876645] Jul 21 11:26:35 bubba [<ffffffff8151d975>] __mutex_lock_common+0x47/0x30d Jul 21 11:26:35 bubba [ 223.876991] Jul 21 11:26:35 bubba [<ffffffff8151dd36>] mutex_lock_nested+0x3b/0x40 Jul 21 11:26:35 bubba [ 223.877334] Jul 21 11:26:35 bubba [<ffffffff813e8233>] rtnl_lock+0x12/0x14 Jul 21 11:26:35 bubba [ 223.877675] Jul 21 11:26:35 bubba [<ffffffffa003d5a0>] fcoe_update_src_mac+0x2b/0x80 [fcoe] Jul 21 11:26:35 bubba [ 223.878022] Jul 21 11:26:35 bubba [<ffffffffa003d698>] fcoe_flogi_resp+0x5e/0x79 [fcoe] Jul 21 11:26:35 bubba [ 223.878366] Jul 21 11:26:35 bubba [<ffffffffa001566f>] fc_exch_recv+0x7f5/0x9da [libfc] Jul 21 11:26:35 bubba [ 223.878713] Jul 21 11:26:35 bubba [<ffffffffa00327d8>] fcoe_ctlr_recv_work+0x71f/0x10dc [libfcoe] Jul 21 11:26:35 bubba [ 223.879258] Jul 21 11:26:35 bubba [<ffffffff81053761>] process_one_work+0x1d7/0x347 Jul 21 11:26:35 bubba [ 223.879601] Jul 21 11:26:35 bubba [<ffffffff81054ade>] worker_thread+0xf8/0x17c Jul 21 11:26:35 bubba [ 223.879944] Jul 21 11:26:35 bubba [<ffffffff81058184>] kthread+0x7d/0x85 Jul 21 11:26:35 bubba [ 223.880287] Jul 21 11:26:35 bubba [<ffffffff81526414>] kernel_thread_helper+0x4/0x10 Jul 21 11:26:35 bubba [ 223.880634] Jul 21 11:26:35 bubba [ 223.880635] -> #0 Jul 21 11:26:35 bubba ((&fip->recv_work) Jul 21 11:26:35 bubba ){+.+.+.} Jul 21 11:26:35 bubba : Jul 21 11:26:35 bubba [ 223.881357] Jul 21 11:26:35 bubba [<ffffffff8106b93e>] __lock_acquire+0xb1d/0xe2c Jul 21 11:26:35 bubba [ 223.881695] Jul 21 11:26:35 bubba [<ffffffff8106c14a>] lock_acquire+0xd2/0xf7 Jul 21 11:26:35 bubba [ 223.882033] Jul 21 11:26:35 bubba [<ffffffff81053241>] wait_on_work+0x50/0xbd Jul 21 11:26:35 bubba [ 223.882378] Jul 21 11:26:35 bubba [<ffffffff81053b32>] __cancel_work_timer+0xb6/0xf4 Jul 21 11:26:35 bubba [ 223.882718] Jul 21 11:26:35 bubba [<ffffffff81053b8a>] cancel_work_sync+0xb/0xd Jul 21 11:26:35 bubba [ 223.883057] Jul 21 11:26:35 bubba [<ffffffffa00317e6>] fcoe_ctlr_destroy+0x1d/0x67 [libfcoe] Jul 21 11:26:35 bubba [ 223.883399] Jul 21 11:26:35 bubba [<ffffffffa003e51e>] fcoe_interface_release+0x21/0x45 [fcoe] Jul 21 11:26:35 bubba [ 223.883940] Jul 21 11:26:35 bubba [<ffffffff811fbbe6>] kref_put+0x43/0x4d Jul 21 11:26:35 bubba [ 223.884280] Jul 21 11:26:35 bubba [<ffffffffa003ebba>] fcoe_interface_put+0x17/0x19 [fcoe] Jul 21 11:26:35 bubba [ 223.884624] Jul 21 11:26:35 bubba [<ffffffffa003f2a6>] fcoe_interface_cleanup+0x188/0x193 [fcoe] Jul 21 11:26:35 bubba [ 223.885163] Jul 21 11:26:35 bubba [<ffffffffa003f303>] fcoe_destroy+0x52/0x72 [fcoe] Jul 21 11:26:35 bubba [ 223.885502] Jul 21 11:26:35 bubba [<ffffffffa00340a4>] fcoe_transport_destroy+0xab/0x110 [libfcoe] Jul 21 11:26:35 bubba [ 223.886045] Jul 21 11:26:35 bubba [<ffffffff81056153>] param_attr_store+0x43/0x62 Jul 21 11:26:35 bubba [ 223.886385] Jul 21 11:26:35 bubba [<ffffffff8105602d>] module_attr_store+0x21/0x25 Jul 21 11:26:35 bubba [ 223.886728] Jul 21 11:26:35 bubba [<ffffffff8114c23d>] sysfs_write_file+0x103/0x13f Jul 21 11:26:35 bubba [ 223.887068] Jul 21 11:26:35 bubba [<ffffffff810f3e7b>] vfs_write+0xa7/0xfa Jul 21 11:26:35 bubba [ 223.887406] Jul 21 11:26:35 bubba [<ffffffff810f4073>] sys_write+0x45/0x69 Jul 21 11:26:35 bubba [ 223.887742] Jul 21 11:26:35 bubba [<ffffffff815252bb>] system_call_fastpath+0x16/0x1b Jul 21 11:26:35 bubba [ 223.888083] Jul 21 11:26:35 bubba [ 223.888084] other info that might help us debug this: Jul 21 11:26:35 bubba [ 223.888085] Jul 21 11:26:35 bubba [ 223.888879] Possible unsafe locking scenario: Jul 21 11:26:35 bubba [ 223.888881] Jul 21 11:26:35 bubba [ 223.889411] CPU0 CPU1 Jul 21 11:26:35 bubba [ 223.889683] ---- ---- Jul 21 11:26:35 bubba [ 223.889955] lock( Jul 21 11:26:35 bubba rtnl_mutex Jul 21 11:26:35 bubba ); Jul 21 11:26:35 bubba [ 223.890349] lock( Jul 21 11:26:35 bubba (&fip->recv_work) Jul 21 11:26:35 bubba ); Jul 21 11:26:35 bubba [ 223.890751] lock( Jul 21 11:26:35 bubba rtnl_mutex Jul 21 11:26:35 bubba ); Jul 21 11:26:35 bubba [ 223.891154] lock( Jul 21 11:26:35 bubba (&fip->recv_work) Jul 21 11:26:35 bubba ); Jul 21 11:26:35 bubba [ 223.891549] Jul 21 11:26:35 bubba [ 223.891550] *** DEADLOCK *** Jul 21 11:26:35 bubba [ 223.891551] Jul 21 11:26:35 bubba [ 223.892347] 6 locks held by lockdeptest.sh/3464: Jul 21 11:26:35 bubba [ 223.892621] #0: Jul 21 11:26:35 bubba (&buffer->mutex Jul 21 11:26:35 bubba ){+.+.+.} Jul 21 11:26:35 bubba , at: Jul 21 11:26:35 bubba [<ffffffff8114c171>] sysfs_write_file+0x37/0x13f Jul 21 11:26:35 bubba [ 223.893359] #1: Jul 21 11:26:35 bubba (s_active Jul 21 11:26:35 bubba ){++++.+} Jul 21 11:26:35 bubba , at: Jul 21 11:26:35 bubba [<ffffffff8114c21c>] sysfs_write_file+0xe2/0x13f Jul 21 11:26:35 bubba [ 223.894094] #2: Jul 21 11:26:35 bubba (param_lock Jul 21 11:26:35 bubba ){+.+.+.} Jul 21 11:26:35 bubba , at: Jul 21 11:26:35 bubba [<ffffffff81056146>] param_attr_store+0x36/0x62 Jul 21 11:26:35 bubba [ 223.894835] #3: Jul 21 11:26:35 bubba (ft_mutex Jul 21 11:26:35 bubba ){+.+.+.} Jul 21 11:26:35 bubba , at: Jul 21 11:26:35 bubba [<ffffffffa0034017>] fcoe_transport_destroy+0x1e/0x110 [libfcoe] Jul 21 11:26:35 bubba [ 223.895574] #4: Jul 21 11:26:35 bubba (fcoe_config_mutex Jul 21 11:26:35 bubba ){+.+.+.} Jul 21 11:26:35 bubba , at: Jul 21 11:26:35 bubba [<ffffffffa003f2c9>] fcoe_destroy+0x18/0x72 [fcoe] Jul 21 11:26:35 bubba [ 223.896314] #5: Jul 21 11:26:35 bubba (rtnl_mutex Jul 21 11:26:35 bubba ){+.+.+.} Jul 21 11:26:35 bubba , at: Jul 21 11:26:35 bubba [<ffffffff813e8233>] rtnl_lock+0x12/0x14 Jul 21 11:26:35 bubba [ 223.897047] Jul 21 11:26:35 bubba [ 223.897048] stack backtrace: Jul 21 11:26:35 bubba [ 223.897578] Pid: 3464, comm: lockdeptest.sh Not tainted 3.0.0-rc7+ #1 Jul 21 11:26:35 bubba [ 223.897853] Call Trace: Jul 21 11:26:35 bubba [ 223.898128] [<ffffffff81068e16>] print_circular_bug+0x1f8/0x209 Jul 21 11:26:35 bubba [ 223.898416] [<ffffffff8106b93e>] __lock_acquire+0xb1d/0xe2c Jul 21 11:26:35 bubba [ 223.898699] [<ffffffff810531f1>] ? wait_on_cpu_work+0xe6/0xe6 Jul 21 11:26:35 bubba [ 223.898982] [<ffffffff8106c14a>] lock_acquire+0xd2/0xf7 Jul 21 11:26:35 bubba [ 223.899263] [<ffffffff810531f1>] ? wait_on_cpu_work+0xe6/0xe6 Jul 21 11:26:35 bubba [ 223.899547] [<ffffffff8104a097>] ? mod_timer+0x8f/0x98 Jul 21 11:26:35 bubba [ 223.899827] [<ffffffff81053241>] wait_on_work+0x50/0xbd Jul 21 11:26:35 bubba [ 223.900108] [<ffffffff810531f1>] ? wait_on_cpu_work+0xe6/0xe6 Jul 21 11:26:35 bubba [ 223.900390] [<ffffffff81053b32>] __cancel_work_timer+0xb6/0xf4 Jul 21 11:26:35 bubba [ 223.900671] [<ffffffff81053b8a>] cancel_work_sync+0xb/0xd Jul 21 11:26:35 bubba [ 223.900953] [<ffffffffa00317e6>] fcoe_ctlr_destroy+0x1d/0x67 [libfcoe] Jul 21 11:26:35 bubba [ 223.901237] [<ffffffffa003e51e>] fcoe_interface_release+0x21/0x45 [fcoe] Jul 21 11:26:35 bubba [ 223.901522] [<ffffffffa003e4fd>] ? fcoe_enable+0x6b/0x6b [fcoe] Jul 21 11:26:35 bubba [ 223.901803] [<ffffffff811fbbe6>] kref_put+0x43/0x4d Jul 21 11:26:35 bubba [ 223.902083] [<ffffffffa003ebba>] fcoe_interface_put+0x17/0x19 [fcoe] Jul 21 11:26:35 bubba [ 223.902367] [<ffffffffa003f2a6>] fcoe_interface_cleanup+0x188/0x193 [fcoe] Jul 21 11:26:35 bubba [ 223.902653] [<ffffffff8151dd36>] ? mutex_lock_nested+0x3b/0x40 Jul 21 11:26:35 bubba [ 223.902939] [<ffffffffa003f303>] fcoe_destroy+0x52/0x72 [fcoe] Jul 21 11:26:35 bubba [ 223.903223] [<ffffffffa00340a4>] fcoe_transport_destroy+0xab/0x110 [libfcoe] Jul 21 11:26:35 bubba [ 223.903508] [<ffffffff81056153>] param_attr_store+0x43/0x62 Jul 21 11:26:35 bubba [ 223.903792] [<ffffffff8105602d>] module_attr_store+0x21/0x25 Jul 21 11:26:35 bubba [ 223.904075] [<ffffffff8114c23d>] sysfs_write_file+0x103/0x13f Jul 21 11:26:35 bubba [ 223.904357] [<ffffffff810f3e7b>] vfs_write+0xa7/0xfa Jul 21 11:26:35 bubba [ 223.904642] [<ffffffff810f51d6>] ? fget_light+0x35/0x96 Jul 21 11:26:35 bubba [ 223.904923] [<ffffffff810f4073>] sys_write+0x45/0x69 Jul 21 11:26:35 bubba [ 223.905204] [<ffffffff815252bb>] system_call_fastpath+0x16/0x1b Jul 21 11:26:36 bubba [ 223.964438] ixgbe 0000:05:00.0: eth3: detected SFP+: 5 Jul 21 11:26:37 bubba [ 225.196702] ixgbe 0000:05:00.0: eth3: NIC Link is Up 10 Gbps, Flow Control: None Signed-off-by: NRobert Love <robert.w.love@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Reviewed-by: NYi Zou <yi.zou@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 27 8月, 2011 11 次提交
-
-
由 Chad Dupuis 提交于
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Saurav Kashyap 提交于
The memset of the fcp_cmnd struct needs to be moved so that it will not zero-out valid data. Signed-off-by: NSaurav Kashyap <saurav.kashyap@qlogic.com> Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Andrew Vasquez 提交于
Transitioning to a LOOP_UPDATE loop-state could cause the driver to miss normal link/target processing. LOOP_UPDATE is a crufty artifact leftover from at time the driver performed it's own internal command-queuing. Safely remove this state. Signed-off-by: NAndrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Saurav Kashyap 提交于
Signed-off-by: NSaurav Kashyap <saurav.kashyap@qlogic.com> Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Chad Dupuis 提交于
Close a small window where we could falsely fail an abort request if the mailbox command fails but the command was returned during interrupt context. Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Saurav Kashyap 提交于
The dsd list shouldn't be manipulated without taking the per host hardware lock to prevent multiple callers from trampling upon one another. Signed-off-by: NSaurav Kashyap <saurav.kashyap@qlogic.com> Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Chad Dupuis 提交于
Since we enable interrupts before initializing the firmware, use the chip revision from PCI config space directly to perform the chip revision check. Also remove the unnecessary firmware attributes test. Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Arun Easi 提交于
This fix: - Disables app tag peeking; correct tag check will be added when the SCSI API is available. - Always derive ref_tag from scsi_get_lba() - Removes incorrect swap of FCP_LUN in FCP_CMND - Moves app-tag error check before ref-tag check. The reason being, currently there is no interface in SCSI to retrieve the app-tag for protection I/Os, so driver puts zero for app-tag in the firmware interface, but requests not to validate it, but when a ref-tag error is detected by firmware, it would put expected/actual tags for all the protection tags (guard/app/ref). As driver checks for app tag error first, a ref-tag error is incorrectly flagged as app-tag error. - Convert HBA specific checks to capability based. Signed-off-by: NArun Easi <arun.easi@qlogic.com> Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Arun Easi 提交于
Driver needs to update protection bytes for uninitialized sectors as they are not DMA-d. Signed-off-by: NArun Easi <arun.easi@qlogic.com> Reviewed-by: NAndrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: NChad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Stephen M. Cameron 提交于
If a physical device exposed to the OS by hpsa is replaced (e.g. one hot plug tape drive is replaced by another, or a tape drive is placed into "OBDR" mode in which it acts like a CD-ROM device) and a rescan is initiated, the replaced device will be added to the SCSI midlayer with target and lun numbers set to -1. After that, a panic is likely to ensue. When a physical device is replaced, the lun and target number should be preserved. Signed-off-by: NStephen M. Cameron <scameron@beardog.cce.hp.com> Cc: stable@kernel.org Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Stephen M. Cameron 提交于
The test to detect OBDR ("One Button Disaster Recovery") cd-rom devices was comparing against uninitialized data. Fixed by moving the test for the device to where the inquiry data is collected, and uninitialized variable altogether as it wasn't really being used. Signed-off-by: NStephen M. Cameron <scameron@beardog.cce.hp.com> Cc: stable@kernel.org Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 24 8月, 2011 8 次提交
-
-
由 Dan Williams 提交于
Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Dan Williams 提交于
Hardware only increments the put pointer on event types >= 4. Do not increment the get pointer for event type 3. Reported-by: NKapil Karkra <kapil.karkra@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Dan Williams 提交于
Hardware allows both an outstanding number commands and a timeout value (whichever occurs first) as a gate to the next interrupt generation. This scheme at completion time looks at the remaining number of outstanding tasks and sets the timeout to maximize small transaction operation. If transactions are large (take more than a few 10s of microseconds to complete) then performance is not interrupt processing bound, so the small timeouts this scheme generates are overridden by the time it takes for a completion to arrive. Tested-by: NDave Jiang <dave.jiang@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Jeff Skirvin 提交于
Instead of immediately completing any request that has a second termination call made on it, wait for the TC done/abort HW event. Signed-off-by: NJeff Skirvin <jeffrey.d.skirvin@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Dave Jiang 提交于
Adding API update for adding isci_id entry scsi_host sysfs entry. Also fixing up the sysfs registration to the scsi_host template Signed-off-by: NDave Jiang <dave.jiang@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Marcin Tomczak 提交于
Need the following workaround in the driver for interoperability with the older Intel SSD drives and any other SATA drive that may exhibit the same behavior. This is a corner case where SCU speed is limited to either 3G or 1.5G and the drive has a period of DC idle when it switches speed during SATA speed negotiation. Workaround :change PHYTOV[31:24] from 0x36 to 0x3B. Signed-off-by: NMarcin Tomczak <marcin.tomczak@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Dan Williams 提交于
The unsolicited frame control infrastructure requires a table of dma addresses for the hardware to lookup the frame buffer location by an index. The hardware expects the elements of this table to be 64-bit quantities, so we cannot reference these elements as dma_addr_t. All unsolicited frame protocols are affected, particularly SATA-PIO and SMP which prevented direct-attached SATA drives and expander-attached drives to not be discovered. Cc: <stable@kernel.org> Reported-by: NJacek Danecki <jacek.danecki@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Dan Williams 提交于
A bug (likely copy/paste) that has been carried from the original implementation. The unsolicited frame handling structure returns the d2h fis in the isci_request.stp.rsp buffer. Cc: <stable@kernel.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 28 7月, 2011 6 次提交
-
-
由 Vasu Dev 提交于
Cleanup to: - have selection for all types of frames, not just FCP. - remove redundant cpu_online check once fcoe_select_cpu called as this is not required since later code flow check for offlined cpu. - Simplify fcoe_select_cpu() by removing unnecessary checks to skip curr_cpu, this also fixes possibly infinite loop in case of curr_cpu is the only cpu while iterating in the loop. This cleanup mainly applies to target as incoming request are mostly for target, therefore Kiran has verified the patch with target also. Signed-off-by: NVasu Dev <vasu.dev@intel.com> Tested-by: NKiran Patil <kiran.patil@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Vasu Dev 提交于
Use pending queue to retry FIP frame in case its tx fails and use common pending queue for both fcoe and fip frames using fcoe_port_send. Signed-off-by: NVasu Dev <vasu.dev@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Vasu Dev 提交于
The lport retry timer hits warn on in case it has become ready in response from fip login from fcoe_ctlr_flogi_send(), this is possible but safe code path, therefore removing this warn on. Jun 22 03:16:30 10.0.16.6 [488198.316517] host3: Assigned Port ID 180f02 Jun 22 03:16:32 10.0.16.6 [488200.091561] ------------[ cut here ]------------ Jun 22 03:16:32 10.0.16.6 [488200.091586] WARNING: at drivers/scsi/libfc/fc_lport.c:1355 fc_lport_timeout+0xd9/0xe0 [libfc]() Signed-off-by: NVasu Dev <vasu.dev@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Neerav Parikh 提交于
fc_queuecommand() allocates an FCP packet for each SCSI command and sends it out on the wire. In the process it stores the reference to the FCP packet in the scsi_cmnd structure. Now, in case under stress testing the libfc exchange layer runs out of exchanges the fc_queuecommand() may not be able to send out commands out on the wire. In such a scenario if there is an error in sending the FCP packet out the wire; fc_queuecommand() deletes the FCP packet from internal queue, releases the FCP packet and returns a SCSI_MLQUEUE_HOST_BUSY status to the scsi-ml. But, the reference to the FCP packet set in the scsi_cmnd is not removed from the scsi_cmnd in this code path. This might lead to a crash under stress testing where the scsi_cmnd failed by fc_queuecommand() comes up to fc_eh_abort() via scsi eh thread. fc_eh_abort() will get reference to the FCP packet to be aborted from the scsi_cmnd for further FCP abort related processing and then try to release the FCP packet that has already been released. This patch removes the FCP packet reference from the scsi_cmnd before returning back from fc_queuecommand() in case of an error in sending out the FCP packet. Signed-off-by: NNeerav Parikh <Neerav.Parikh@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Hillf Danton 提交于
The variable on stack, namely cdb_op, is not used but removed. [ Patch reworked by Robert Love due to invalid patch format ] Signed-off-by: NHillf Danton <dhillf@gmail.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Hillf Danton 提交于
One change is to cleanup typo in comment for fc_fcp_recv(), another corrects the misleading comment for fc_fcp_abts_resp(). [ Patch reworked by Robert Love due to invalid patch format ] Signed-off-by: NHillf Danton <dhillf@gmail.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-