- 26 3月, 2013 1 次提交
-
-
由 Robert Love 提交于
We can deadlock (s_active and fcoe_config_mutex) if a port is being destroyed at the same time one is being created. [ 4200.503113] ====================================================== [ 4200.503114] [ INFO: possible circular locking dependency detected ] [ 4200.503116] 3.8.0-rc5+ #8 Not tainted [ 4200.503117] ------------------------------------------------------- [ 4200.503118] kworker/3:2/2492 is trying to acquire lock: [ 4200.503119] (s_active#292){++++.+}, at: [<ffffffff8122d20b>] sysfs_addrm_finish+0x3b/0x70 [ 4200.503127] but task is already holding lock: [ 4200.503128] (fcoe_config_mutex){+.+.+.}, at: [<ffffffffa02f3338>] fcoe_destroy_work+0xe8/0x120 [fcoe] [ 4200.503133] which lock already depends on the new lock. [ 4200.503135] the existing dependency chain (in reverse order) is: [ 4200.503136] -> #1 (fcoe_config_mutex){+.+.+.}: [ 4200.503139] [<ffffffff810c7711>] lock_acquire+0xa1/0x140 [ 4200.503143] [<ffffffff816ca7be>] mutex_lock_nested+0x6e/0x360 [ 4200.503146] [<ffffffffa02f11bd>] fcoe_enable+0x1d/0xb0 [fcoe] [ 4200.503148] [<ffffffffa02f127d>] fcoe_ctlr_enabled+0x2d/0x50 [fcoe] [ 4200.503151] [<ffffffffa02ffbe8>] store_ctlr_enabled+0x38/0x90 [libfcoe] [ 4200.503154] [<ffffffff81424878>] dev_attr_store+0x18/0x30 [ 4200.503157] [<ffffffff8122b750>] sysfs_write_file+0xe0/0x150 [ 4200.503160] [<ffffffff811b334c>] vfs_write+0xac/0x180 [ 4200.503162] [<ffffffff811b3692>] sys_write+0x52/0xa0 [ 4200.503164] [<ffffffff816d7159>] system_call_fastpath+0x16/0x1b [ 4200.503167] -> #0 (s_active#292){++++.+}: [ 4200.503170] [<ffffffff810c680f>] __lock_acquire+0x135f/0x1c90 [ 4200.503172] [<ffffffff810c7711>] lock_acquire+0xa1/0x140 [ 4200.503174] [<ffffffff8122c626>] sysfs_deactivate+0x116/0x160 [ 4200.503176] [<ffffffff8122d20b>] sysfs_addrm_finish+0x3b/0x70 [ 4200.503178] [<ffffffff8122b2eb>] sysfs_hash_and_remove+0x5b/0xb0 [ 4200.503180] [<ffffffff8122f3d1>] sysfs_remove_group+0x61/0x100 [ 4200.503183] [<ffffffff814251eb>] device_remove_groups+0x3b/0x60 [ 4200.503185] [<ffffffff81425534>] device_remove_attrs+0x44/0x80 [ 4200.503187] [<ffffffff81425e97>] device_del+0x127/0x1c0 [ 4200.503189] [<ffffffff81425f52>] device_unregister+0x22/0x60 [ 4200.503191] [<ffffffffa0300970>] fcoe_ctlr_device_delete+0xe0/0xf0 [libfcoe] [ 4200.503194] [<ffffffffa02f1b5c>] fcoe_interface_cleanup+0x6c/0xa0 [fcoe] [ 4200.503196] [<ffffffffa02f3355>] fcoe_destroy_work+0x105/0x120 [fcoe] [ 4200.503198] [<ffffffff8107ee91>] process_one_work+0x1a1/0x580 [ 4200.503203] [<ffffffff81080c6e>] worker_thread+0x15e/0x440 [ 4200.503205] [<ffffffff8108715a>] kthread+0xea/0xf0 [ 4200.503207] [<ffffffff816d70ac>] ret_from_fork+0x7c/0xb0 [ 4200.503209] other info that might help us debug this: [ 4200.503211] Possible unsafe locking scenario: [ 4200.503212] CPU0 CPU1 [ 4200.503213] ---- ---- [ 4200.503214] lock(fcoe_config_mutex); [ 4200.503215] lock(s_active#292); [ 4200.503218] lock(fcoe_config_mutex); [ 4200.503219] lock(s_active#292); [ 4200.503221] *** DEADLOCK *** [ 4200.503223] 3 locks held by kworker/3:2/2492: [ 4200.503224] #0: (fcoe){.+.+.+}, at: [<ffffffff8107ee2b>] process_one_work+0x13b/0x580 [ 4200.503228] #1: ((&port->destroy_work)){+.+.+.}, at: [<ffffffff8107ee2b>] process_one_work+0x13b/0x580 [ 4200.503232] #2: (fcoe_config_mutex){+.+.+.}, at: [<ffffffffa02f3338>] fcoe_destroy_work+0xe8/0x120 [fcoe] [ 4200.503236] stack backtrace: [ 4200.503238] Pid: 2492, comm: kworker/3:2 Not tainted 3.8.0-rc5+ #8 [ 4200.503240] Call Trace: [ 4200.503243] [<ffffffff816c2f09>] print_circular_bug+0x1fb/0x20c [ 4200.503246] [<ffffffff810c680f>] __lock_acquire+0x135f/0x1c90 [ 4200.503248] [<ffffffff810c463a>] ? debug_check_no_locks_freed+0x9a/0x180 [ 4200.503250] [<ffffffff810c7711>] lock_acquire+0xa1/0x140 [ 4200.503253] [<ffffffff8122d20b>] ? sysfs_addrm_finish+0x3b/0x70 [ 4200.503255] [<ffffffff8122c626>] sysfs_deactivate+0x116/0x160 [ 4200.503258] [<ffffffff8122d20b>] ? sysfs_addrm_finish+0x3b/0x70 [ 4200.503260] [<ffffffff8122d20b>] sysfs_addrm_finish+0x3b/0x70 [ 4200.503262] [<ffffffff8122b2eb>] sysfs_hash_and_remove+0x5b/0xb0 [ 4200.503265] [<ffffffff8122f3d1>] sysfs_remove_group+0x61/0x100 [ 4200.503273] [<ffffffff814251eb>] device_remove_groups+0x3b/0x60 [ 4200.503275] [<ffffffff81425534>] device_remove_attrs+0x44/0x80 [ 4200.503277] [<ffffffff81425e97>] device_del+0x127/0x1c0 [ 4200.503279] [<ffffffff81425f52>] device_unregister+0x22/0x60 [ 4200.503282] [<ffffffffa0300970>] fcoe_ctlr_device_delete+0xe0/0xf0 [libfcoe] [ 4200.503285] [<ffffffffa02f1b5c>] fcoe_interface_cleanup+0x6c/0xa0 [fcoe] [ 4200.503287] [<ffffffffa02f3355>] fcoe_destroy_work+0x105/0x120 [fcoe] [ 4200.503290] [<ffffffff8107ee91>] process_one_work+0x1a1/0x580 [ 4200.503292] [<ffffffff8107ee2b>] ? process_one_work+0x13b/0x580 [ 4200.503295] [<ffffffffa02f3250>] ? fcoe_if_destroy+0x230/0x230 [fcoe] [ 4200.503297] [<ffffffff81080c6e>] worker_thread+0x15e/0x440 [ 4200.503299] [<ffffffff81080b10>] ? busy_worker_rebind_fn+0x100/0x100 [ 4200.503301] [<ffffffff8108715a>] kthread+0xea/0xf0 [ 4200.503304] [<ffffffff81087070>] ? kthread_create_on_node+0x160/0x160 [ 4200.503306] [<ffffffff816d70ac>] ret_from_fork+0x7c/0xb0 [ 4200.503308] [<ffffffff81087070>] ? kthread_create_on_node+0x160/0x160 Signed-off-by: NRobert Love <robert.w.love@intel.com> Tested-by: NJack Morgan <jack.morgan@intel.com>
-
- 29 1月, 2013 2 次提交
-
-
由 Neerav Parikh 提交于
This patch fixes following deadlock caused by destroying of an FCoE interface with active NPIV ports on that interface. Call Trace: [<ffffffff814b7e88>] schedule+0x64/0x66 [<ffffffff814b6b4f>] schedule_timeout+0x36/0xe3 [<ffffffff81070c55>] ? update_curr+0xd6/0x110 [<ffffffff81071f6b>] ? hrtick_update+0x1b/0x4d [<ffffffff81072405>] ? dequeue_task_fair+0x1ca/0x1d9 [<ffffffff8106a369>] ? need_resched+0x1e/0x28 [<ffffffff814b7d14>] wait_for_common+0x9b/0xf1 [<ffffffff8106e7be>] ? try_to_wake_up+0x1e0/0x1e0 [<ffffffff814b7e22>] wait_for_completion+0x1d/0x1f [<ffffffff8105ae82>] flush_workqueue+0x116/0x2a1 [<ffffffff8105b357>] drain_workqueue+0x66/0x14c [<ffffffff8105b8ef>] destroy_workqueue+0x1a/0xcf [<ffffffffa009211e>] fc_remove_host+0x154/0x17f [scsi_transport_fc] [<ffffffffa00edbb8>] fcoe_if_destroy+0x184/0x1c9 [fcoe] [<ffffffffa00edc28>] fcoe_destroy_work+0x2b/0x44 [fcoe] [<ffffffff8105a82a>] process_one_work+0x1a8/0x2a4 [<ffffffffa00edbfd>] ? fcoe_if_destroy+0x1c9/0x1c9 [fcoe] [<ffffffff8105c396>] worker_thread+0x1db/0x268 [<ffffffff810604a3>] ? wake_up_bit+0x2a/0x2a [<ffffffff8105c1bb>] ? manage_workers.clone.16+0x1f6/0x1f6 [<ffffffff8105ffd6>] kthread+0x6f/0x77 [<ffffffff814c0304>] kernel_thread_helper+0x4/0x10 [<ffffffff8105ff67>] ? kthread_freezable_should_stop+0x4b/0x4b Call Trace: [<ffffffff814b7e88>] schedule+0x64/0x66 [<ffffffff814b8041>] schedule_preempt_disabled+0xe/0x10 [<ffffffff814b70a1>] __mutex_lock_common.clone.5+0x117/0x17a [<ffffffff814b7117>] __mutex_lock_slowpath+0x13/0x15 [<ffffffff814b6f76>] mutex_lock+0x23/0x37 [<ffffffff8125b890>] ? list_del+0x11/0x30 [<ffffffffa00edc84>] fcoe_vport_destroy+0x43/0x5f [fcoe] [<ffffffffa009130a>] fc_vport_terminate+0x48/0x110 [scsi_transport_fc] [<ffffffffa00913ef>] fc_vport_sched_delete+0x1d/0x79 [scsi_transport_fc] [<ffffffff8105a82a>] process_one_work+0x1a8/0x2a4 [<ffffffffa00913d2>] ? fc_vport_terminate+0x110/0x110 [scsi_transport_fc] [<ffffffff8105c396>] worker_thread+0x1db/0x268 [<ffffffff8105c1bb>] ? manage_workers.clone.16+0x1f6/0x1f6 [<ffffffff8105ffd6>] kthread+0x6f/0x77 [<ffffffff814c0304>] kernel_thread_helper+0x4/0x10 [<ffffffff8105ff67>] ? kthread_freezable_should_stop+0x4b/0x4b [<ffffffff814c0300>] ? gs_change+0x13/0x13 A prior attempt to fix this issue is posted here: http://lists.open-fcoe.org/pipermail/devel/2012-October/012318.html or http://article.gmane.org/gmane.linux.scsi.open-fcoe.devel/11924 Based on feedback and discussion with Neil Horman it seems that the above patch may have a case where the fcoe_vport_destroy() and fcoe_destroy_work() can race; hence that patch has been withdrawn with this patch that is trying to solve the same problem in a different way. In the current approach instead of removing the fcoe_config_mutex from the vport_delete callback function; I've chosen to delete all the NPIV ports first on a given root lport before continuing with the removal of the root lport. Signed-off-by: NNeerav Parikh <Neerav.Parikh@intel.com> Tested-by: NMarcus Dennis <marcusx.e.dennis@intel.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NRobert Love <robert.w.love@intel.com>
-
由 Neil Horman 提交于
When creating an fcoe interfce, we call fcoe_link_speed_update before we add the lports fcoe interface to the fc_hostlist. Since network device events like NETDEV_CHANGE are only processed if an fcoe interface is found with an underlying netdev that matches the netdev of the event. Since this processing in fcoe_device_notification is how link_speed changes get communicated to the libfc code (via fcoe_link_speed_update), we have a race condition - if a NETDEV_CHANGE event is sent after the call to fcoe_link_speed_update in fcoe_netdev_config, but before we add the interface to the fc_hostlist, we will loose the event and attributes like /sys/class/fc_host/hostX/speed will not get updated properly. Fix this by moving the add to the fc_hostlist above the serialized call to fcoe_netdev_config, ensuring that we catch netdev envents before we make a direct call to fcoe_link_speed_update. Also use this opportunity to clean up access to the fc_hostlist a bit by creating a fcoe_hostlist_del accessor and replacing the cleanup in fcoe_exit to use it properly. Tested by myself successfully [ Comment over 80 chars broken into multi-line by Robert Love to satisfy checkpatch.pl ] Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Reviewed-by: NYi Zou <yi.zou@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com>
-
- 15 12月, 2012 5 次提交
-
-
由 Yi Zou 提交于
Similarly they can be moved into libfcoe instead of being private to fcoe now. Also add comments particularly on the term LESB to the corresponding function. Signed-off-by: NYi Zou <yi.zou@intel.com> Cc: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Tested-by: NMarcus Dennis <marcusx.e.dennis@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com>
-
由 Yi Zou 提交于
With the previous patch, fcoe_link_speed_update() can be moved into libfcoe and exported to used by fcoe, bnx2fc, and etc. Signed-off-by: NYi Zou <yi.zou@intel.com> Cc: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Tested-by: NMarcus Dennis <marcusx.e.dennis@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com>
-
由 Yi Zou 提交于
Adds support to fcoe_port's newly added get_netdev fucntion pointer. Signed-off-by: NYi Zou <yi.zou@intel.com> Cc: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Tested-by: NMarcus Dennis <marcusx.e.dennis@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com>
-
由 Robert Love 提交于
This patch adds support for the new fcoe_sysfs control interface to fcoe.ko. It keeps the deprecated interface in tact and therefore either the legacy or the new control interfaces can be used. A mixed mode is not supported. A user must either use the new interfaces or the old ones, but not both. The fcoe_ctlr's link state is now driven by both the netdev link state as well as the fcoe_ctlr_device's enabled attribute. The link must be up and the fcoe_ctlr_device must be enabled before the FCoE Controller starts discovery or login. Signed-off-by: NRobert Love <robert.w.love@intel.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com>
-
由 Robert Love 提交于
This patch does a few things. 1) Makes /sys/bus/fcoe/ctlr_{create,destroy} interfaces. These interfaces take an <ifname> and will either create an FCoE Controller or destroy an FCoE Controller depending on which file is written to. The new FCoE Controller will start in a DISABLED state and will not do discovery or login until it is ENABLED. This pause will allow us to configure the FCoE Controller before enabling it. 2) Makes the 'mode' attribute of a fcoe_ctlr_device writale. This allows the user to configure the mode in which the FCoE Controller will start in when it is ENABLED. Possible modes are 'Fabric', or 'VN2VN'. The default mode for a fcoe_ctlr{,_device} is 'Fabric'. Drivers must implement the set_fcoe_ctlr_mode routine to support this feature. libfcoe offers an exported routine to set a FCoE Controller's mode. The mode can only be changed when the FCoE Controller is DISABLED. This patch also removes the get_fcoe_ctlr_mode pointer in the fcoe_sysfs function template, the code in fcoe_ctlr.c to get the mode and the assignment of the fcoe_sysfs function pointer to the fcoe_ctlr.c implementation (in fcoe and bnx2fc). fcoe_sysfs can return that value for the mode without consulting the LLD. 3) Make a 'enabled' attribute of a fcoe_ctlr_device. On a read, fcoe_sysfs will return the attribute's value. On a write, fcoe_sysfs will call the LLD (if there is a callback) to notifiy that the enalbed state has changed. This patch maintains the old FCoE control interfaces as module parameters, but it adds comments pointing out that the old interfaces are deprecated. Signed-off-by: NRobert Love <robert.w.love@intel.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com>
-
- 07 10月, 2012 1 次提交
-
-
由 Neerav Parikh 提交于
SCSI errors were generated while writing to LUNs connected via NPIV ports. Debugging this it was found that the FCoE packets transmitted via the NPIV ports were not tagged with correct user priority as negotiated with peer by DCB agent. This resulted in FCoE traffic going with priority zero(0) that did not have priority flow control (PFC) enabled for it. The initiator after transferring data to the target never saw any reply indicating the transfer was complete. This resulted in error recovery (ABTS) and SCSI command retries by the scsi-mid layer; eventually resulting in I/O errors. This patch fixes this issue by keeping the FCoE user priority information in the fcoe_interface instance that is common for both the physical port as well as NPIV ports connected to that physical port; instead of storing it in fcoe_port structure that has a per port instance. Signed-off-by: NNeerav Parikh <Neerav.Parikh@intel.com> Acked-by: NYi Zou <yi.zou@intel.com> Acked-by: NJohn Fastabend <john.r.fastabend@intel.com> Tested-by: NMarcus Dennis <marcusx.e.dennis@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 20 7月, 2012 2 次提交
-
-
由 Neil Horman 提交于
Noticed that we can shuffle the code around in fcoe_percpu_receive_thread a bit and avoid taking the fcoe_rx_list lock twice per iteration. This should improve throughput somewhat. With this change we take the lock, and check for new frames in a single critical section. Only if the list is empty do we drop the lock and re-acquire it after being signaled to wake up. Change Notes: v2) did some further cleanup on the patch by replacing the 2nd call of spin_lock/splice_init with a goto to the top of the outer loop. This allows me to change the inner while loop to an if conditional and remove the sencond check of kthread_should_stop. Based on suggestion from Vasu Dev. Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Acked-by: NVasu Dev <vasu.dev@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Vasu Dev 提交于
The libfc is used by fcoe but fcoe agnostic, and therefore should not have any fcoe references. So renaming fcoe_dev_stats from libfc as its for fc_stats. After that libfc is fcoe string free except some strings for Open-FCoE.org. Signed-off-by: NVasu Dev <vasu.dev@intel.com> Acked-by : Robert Love <robert.w.love@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Acked-by: NBhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 23 5月, 2012 2 次提交
-
-
由 Robert Love 提交于
This patch has the SW FCoE driver and the bnx2fc driver make use of the new fcoe_sysfs API added earlier in this patch series. After this patch a fcoe_ctlr_device is allocated with private data in this order. +------------------+ +------------------+ | fcoe_ctlr_device | | fcoe_ctlr_device | +------------------+ +------------------+ | fcoe_ctlr | | fcoe_ctlr | +------------------+ +------------------+ | fcoe_interface | | bnx2fc_interface | +------------------+ +------------------+ libfcoe also takes part in this new model since it discovers and manages fcoe_fcf instances. The memory allocation is different for FCFs. I didn't want to impact libfcoe's fcoe_fcf processing, so this patch creates fcoe_fcf_device instances for each discovered fcoe_fcf. The two are paired using a (void * priv) member of the fcoe_ctlr_device. This allows libfcoe to continue maintaining its list of fcoe_fcf instances and simply attaches and detaches them from existing or new fcoe_fcf_device instances. Signed-off-by: NRobert Love <robert.w.love@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Robert Love 提交于
Currently the fcoe_ctlr associated with an interface is allocated as a member of struct fcoe_interface. This causes problems when attempting to use the new fcoe_sysfs APIs which allow us to allocate the fcoe_interface as private data to the fcoe_ctlr_device instance. The problem is that libfcoe wants to be able use pointer math to find a fcoe_ctlr's fcoe_ctlr_device as well as finding a fcoe_ctlr_device's assocated fcoe_ctlr. To do this we need to allocate the fcoe_ctlr_device, with private data for the LLD. The private data contains the fcoe_ctlr and its private data is the fcoe_interface. This patch only allocates the fcoe_interface with the fcoe_ctlr, the fcoe_ctlr_device will be added in a later patch, which will complete the below diagram- +------------------+ | fcoe_ctlr_device | +------------------+ | fcoe_ctlr | +------------------+ | fcoe_interface | +------------------+ This prep work will allow us to go from a fcoe_ctlr_device instance to its fcoe_ctlr as well as from a fcoe_ctlr to its fcoe_ctlr_device once the fcoe_sysfs API is in use (later patches in this series). Signed-off-by: NRobert Love <robert.w.love@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 10 5月, 2012 4 次提交
-
-
由 Dan Carpenter 提交于
We moved the locking in dd060e74 "[SCSI] fcoe: remove frame dropping code from fcoe_percpu_clean" but this unlock was missed. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Robert Love 提交于
The rtnl_mutex was held to protect calls to dev_uc_add and dev_uc_del. Holding rtnl is not required as those functions make use of the netif_addr_lock* API to protect the MAC changing. This change fixes the following regression by removing the rtnl usage when fcoe_update_src_mac is called. https://bugzilla.kernel.org/show_bug.cgi?id=42918 the existing dependency chain (in reverse order) is: -> #1 (&fip->ctlr_mutex){+.+...}: [<c1091f70>] lock_acquire+0x80/0x1b0 [<c147655d>] mutex_lock_nested+0x6d/0x340 [<f8970c32>] fcoe_ctlr_link_up+0x22/0x180 [libfcoe] [<f894620e>] fcoe_create+0x47e/0x6e0 [fcoe] [<f8973dd3>] fcoe_transport_create+0x143/0x250 [libfcoe] [<c10527e0>] param_attr_store+0x30/0x60 [<c1052696>] module_attr_store+0x26/0x40 [<c11a201e>] sysfs_write_file+0xae/0x100 [<c11449df>] vfs_write+0x8f/0x160 [<c1144cbd>] sys_write+0x3d/0x70 [<c147a0c4>] syscall_call+0x7/0xb -> #0 (rtnl_mutex){+.+.+.}: [<c109164b>] __lock_acquire+0x140b/0x1720 [<c1091f70>] lock_acquire+0x80/0x1b0 [<c147655d>] mutex_lock_nested+0x6d/0x340 [<c13a10c4>] rtnl_lock+0x14/0x20 [<f89445ac>] fcoe_update_src_mac+0x2c/0xb0 [fcoe] [<f8971712>] fcoe_ctlr_timer_work+0x712/0xb60 [libfcoe] [<c104fb69>] process_one_work+0x179/0x5d0 [<c10502f1>] worker_thread+0x121/0x2d0 [<c10550ed>] kthread+0x7d/0x90 [<c1481a82>] kernel_thread_helper+0x6/0x10 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fip->ctlr_mutex); lock(rtnl_mutex); lock(&fip->ctlr_mutex); lock(rtnl_mutex); *** DEADLOCK *** Signed-off-by: NRobert Love <robert.w.love@intel.com> Tested-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Vasu Dev 提交于
The fcoe controller has back references, therefore defer releasing master lport which gets freed along scsi_host_put and then free it once fcoe interface is fully cleaned. Signed-off-by: NVasu Dev <vasu.dev@intel.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Vasu Dev 提交于
Remove lport from net device and then do synchronize net device to flush inflight rx frames for the lport before doing fcoe_percpu_clean. In case of master lport, remove all rx packet handlers completely and then only do fcoe_percpu_clean. This required splitting fcoe_interface_cleanup to do remove part separately and for that added func fcoe_interface_remove and then call it from fcoe_if_destory before doing fcoe_percpu_clean. However if fcoe_interface_remove() is already called then don't call again from fcoe_interface_cleanup() to preserve its existing flows. This patch along with Neil's other patch to avoid soft irq context on ingress will avoid passing up frames on disabled lport as discussed in this mail thread:- http://lists.open-fcoe.org/pipermail/devel/2012-February/011947.htmlSigned-off-by: NVasu Dev <vasu.dev@intel.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 28 3月, 2012 5 次提交
-
-
由 Robert Love 提交于
The rtnl_lock is primarily used to serialize networking driver changes as well as to ensure that a networking driver is not removed when making changes to it. fcoe also uses the rtnl_lock to protect the fcoe hostlist. fcoe_create holds the rtnl_lock over the entirity of the routine including a the call to fcoe_ctlr_link_up. This causes the below deadlock because fcoe_ctlr_link_up acquires the fcoe_ctlr ctlr_mutex and this deadlocks with a libfcoe thread that acquires the fcoe_ctlr ctlr_mutex and then the rtnl_lock (to update a MAC address). This patch drops the rtnl_lock before calling fcoe_ctlr_link_up and therefore the deadlock is prevented. https://bugzilla.kernel.org/show_bug.cgi?id=42918 the existing dependency chain (in reverse order) is: -> #1 (&fip->ctlr_mutex){+.+...}: [<c1091f70>] lock_acquire+0x80/0x1b0 [<c147655d>] mutex_lock_nested+0x6d/0x340 [<f8970c32>] fcoe_ctlr_link_up+0x22/0x180 [libfcoe] [<f894620e>] fcoe_create+0x47e/0x6e0 [fcoe] [<f8973dd3>] fcoe_transport_create+0x143/0x250 [libfcoe] [<c10527e0>] param_attr_store+0x30/0x60 [<c1052696>] module_attr_store+0x26/0x40 [<c11a201e>] sysfs_write_file+0xae/0x100 [<c11449df>] vfs_write+0x8f/0x160 [<c1144cbd>] sys_write+0x3d/0x70 [<c147a0c4>] syscall_call+0x7/0xb -> #0 (rtnl_mutex){+.+.+.}: [<c109164b>] __lock_acquire+0x140b/0x1720 [<c1091f70>] lock_acquire+0x80/0x1b0 [<c147655d>] mutex_lock_nested+0x6d/0x340 [<c13a10c4>] rtnl_lock+0x14/0x20 [<f89445ac>] fcoe_update_src_mac+0x2c/0xb0 [fcoe] [<f8971712>] fcoe_ctlr_timer_work+0x712/0xb60 [libfcoe] [<c104fb69>] process_one_work+0x179/0x5d0 [<c10502f1>] worker_thread+0x121/0x2d0 [<c10550ed>] kthread+0x7d/0x90 [<c1481a82>] kernel_thread_helper+0x6/0x10 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fip->ctlr_mutex); lock(rtnl_mutex); lock(&fip->ctlr_mutex); lock(rtnl_mutex); *** DEADLOCK *** Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Neil Horman 提交于
There is potentially lots of contention for the rx_list_lock. On a cpu that is receiving lots of fcoe traffic, the softirq context has to add and release the lock for every frame it receives, as does the receiving per-cpu thread. We can reduce this contention somewhat by altering the per-cpu threads loop such that when traffic is detected on the fcoe_rx_list, we splice it to a temporary list. In this way, we can process multiple skbs while only having to acquire and release the fcoe_rx_list lock once. [ Braces around single statement while loop removed by Robert Love to satisfy checkpath.pl. ] Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Acked-by: NVasu Dev <vasu.dev@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Neil Horman 提交于
commit e7a51997 ([SCSI] fcoe: flush per-cpu thread work when destroying interface) added a skb flush to the fcoe_rx_list, which ensures that we push any pending frames on the list through the per-cpu receive thread. Because of this, its redundant to lock and scan the list first, dropping any arriving frames. Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Acked-by: NVasu Dev <vasu.dev@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Neil Horman 提交于
The fcoe sw recive packet function (fcoe_rcv) only ever executes in softirq context. Given that, and the fact that no use of the fcoe_rx_list is made in irq context, its not necessecary to disable bottom halves while actually receiving the frame. Convert spin_*_bh calls in that function to their lock-only equivalents Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Acked-by: NVasu Dev <vasu.dev@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Neil Horman 提交于
commit 859b7b64 introduced the ability to call fcoe_recv_frame in softirq context. While this is beneficial to performance, its not safe to do, as it breaks the serialization of access to the lport structure (i.e. when an fcoe interface is being torn down, theres no way to serialize the teardown effort with the completion of receieve operations occuring in softirq context. As a result, lport (and other) data structures can be read and modified in parallel leading to corruption. Most notable is the vport list, which is protected by a mutex, that will cause a panic if a softirq receive while said mutex is locked. Additionaly, the ema_list, discussed here: http://lists.open-fcoe.org/pipermail/devel/2012-February/011947.html Can be corrupted if a list traversal occurs in softirq context at the same time as a list delete in process context. And generally the lport state variables will not be stable, and may lead to unpredictable results. The most direct fix is to remove the bits from the above commit that allowed fcoe_recv_frame to be called in softirq context. We just force all frames to be handled by the per-cpu rx threads. This will allow the fcoe_if_destroy's use of fcoe_percpu_clean to function properly, ensuring that no frames are being received while the lport is being torn down. Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Reviewed-by: NVasu Dev <vasu.dev@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 20 3月, 2012 2 次提交
-
-
由 Cong Wang 提交于
Signed-off-by: NCong Wang <amwang@redhat.com>
-
由 Yi Zou 提交于
Fix a bug when using 'ethtool -K ethx tx off' to turn off tx ip checksum, FCoE CRC offload should not be impacte. The skb_checksum_help() is needed only if it's not FCoE traffic for ip checksum, regardless of ethtool toggling the tx ip checksum on or off. Instead of using CHECKSUM_PARTIAL, we will use CHECKSUM_UNNECESSARY as a proper indication to avoid sw ip checksum on FCoE frames. Ref. to original discussion thread: http://patchwork.ozlabs.org/patch/146567/ CC: "James E.J. Bottomley" <JBottomley@parallels.com> CC: Robert Love <robert.w.love@intel.com> Signed-off-by: NYi Zou <yi.zou@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 3月, 2012 1 次提交
-
-
由 Yi Zou 提交于
Fix a bug when using 'ethtool -K ethx tx off' to turn off tx ip checksum, FCoE CRC offload should not be impacte. The skb_checksum_help() is needed only if it's not FCoE traffic for ip checksum, regardless of ethtool toggling the tx ip checksum on or off. Instead of using CHECKSUM_PARTIAL, we will use CHECKSUM_UNNECESSARY as a proper indication to avoid sw ip checksum on FCoE frames. Ref. to original discussion thread: http://patchwork.ozlabs.org/patch/146567/ CC: "James E.J. Bottomley" <JBottomley@parallels.com> CC: Robert Love <robert.w.love@intel.com> Signed-off-by: NYi Zou <yi.zou@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
- 19 2月, 2012 6 次提交
-
-
由 Robert Love 提交于
The reference counting was necessary on these instances because it was possible for NPIV ports to be destroyed after the N_Port. A previous patch ensures that all NPIV ports are destroyed before the N_Port making the need to track references on the interface unnecessary. Signed-off-by: NRobert Love <robert.w.love@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Robert Love 提交于
Currently all port deletion is routed though the FCoE workqueue (fcoe_wq). When fc_remove_host is called on an N_Port (for example, from fcoe_destroy) the vports are queued into a FC Transport workqueue. fc_remove_host flushes that queue and each vport is passed to fcoe's fcoe_vport_destroy, which simply queues the associated fcoe_ports for later deletion. This queue cannot be flushed within the N_Ports destroy path because of circular locking issues. The result is that the NPIV ports are destroyed after the N_Port, which is reverse of how they are created. This quirk causes fcoe to keep references on the fcoe_interface shared by each of these ports (N_Port and NPIV). Changing the ordering such that NPIV ports are destroyed before the N_Port will allow us to remove reference counting on the fcoe_interface instances. This patch simply allows fcoe_vport_destory to destroy NPIV ports without deferring them to a workqueue context. This ensures that when fc_remove_host is called the NPIV ports will be destroyed first before the N_Port and allows reference counting on the fcoe's fcoe_interface to be remove in a later patch. Signed-off-by: NRobert Love <robert.w.love@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Robert Love 提交于
The label implies that it should be called when there is 'nomod.' I read that to mean that the module reference 'get' failed. However, it's only called when the module reference 'get' succeeded. I think it makes more sense to name the label, 'out_putmod' since it should be called when we need to 'put' the module reference taken in the routine before returning. Signed-off-by: NRobert Love <robert.w.love@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Neerav Parikh 提交于
Allow FDMI attributes to be exposed via the fc_host class object for the fcoe driver. Signed-off-by: NNeerav Parikh <neerav.parikh@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Neerav Parikh 提交于
Allow FDMI attributes to be exposed via the fc_host class object for the fcoe driver. Signed-off-by: NNeerav Parikh <neerav.parikh@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Acked-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Neerav Parikh 提交于
This adds support for updating the FC-GS FDMI attributes in the fcoe driver. Signed-off-by: NNeerav Parikh <neerav.parikh@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Acked-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 16 1月, 2012 2 次提交
-
-
由 Bart Van Assche 提交于
Move the definition of the global variable fcoe_debug_logging from fcoe.h to fcoe.c. Avoid that sparse complains about missing declarations for local functions or variables by declaring these static. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NYi Zou <yi.zou@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Yi Zou 提交于
This is a regression introduced by commit 1ff9918b The else statement here is breaking the initiator logic of allocating xid from the offloaded em xid pool for READ I/O only to use DDP, as shown by the snippet of trace below, where the WRITE is using xid 0x5 from the offloaded em xid pool: Protocol VID Len S_ID D_ID OX_ID RX_ID Summary .. *FCP 228 96 0b.08.01 -> 01.0f.00 0x0005 0xffff SCSI: Write(10) LUN: 0x00 FCP 228 76 01.0f.00 -> 0b.08.01 0x0005 0x828d XFER_RDY ... The bug is in the else statement, for both initiator and target, the new command will have FC frame header bit 23 (FC_FC_EX_CTX) cleared as it was originated from the initiator. Also, this is assuming the frame header is already filled up, which is only true for target since for initiator, this is a new frame and oem_match gets called when em tries get xid for this i/o before it is filled up and sent out. The fix is to check if there is a fc_fcp_pkt associated w/ this frame from fr_fsp(fp), since fr_fsp(fp) is NULL for tcm_fc target and non-I/O frame in initiator. This should also return true for target only if it is an FC_RCTL_DD_UNSOL_CMD and rx_id is not allocated. Signed-off-by: NYi Zou <yi.zou@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 11 1月, 2012 1 次提交
-
-
由 Robert Love 提交于
skb_linearize already has a check for skb_is_nonlinear, there is no need to duplicate the check in fcoe.c. This patch simply removes the unnecessary check and calls skb_linearize unconditionally. Reported-by: Npatrick kelle <patrick.kelle81@gmail.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Acked-by: Npatrick kelle <patrick.kelle81@gmail.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 15 12月, 2011 1 次提交
-
-
由 john fastabend 提交于
Use DCB notifiers to set the skb priority to allow packets to be steered and tagged correctly over DCB enabled drivers that setup traffic classes. This allows queue_mapping() routines to be removed in these drivers that were previously inspecting the ethertype of every skb to mark FCoE/FIP frames. Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Signed-off-by: NRobert Love <robert.w.love@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 14 12月, 2011 1 次提交
-
-
由 Thomas Gleixner 提交于
The error exit path leaks preempt count. Add the missing put_cpu(). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NYi Zou <yi.zou@intel.com> Cc: stable@kernel.org Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 31 10月, 2011 1 次提交
-
-
由 Vasu Dev 提交于
Adds more cases to do flogi retry, now also retry on getting bad response due to either no ELS response or flogi response payload length not large enough. In those cases flogi was not retried and that was leaving lport offline. Signed-off-by: NVasu Dev <vasu.dev@intel.com> Tested-by: NBhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: NYi Zou <yi.zou@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 16 10月, 2011 1 次提交
-
-
由 Bhanu Prakash Gollapudi 提交于
Except for obtaining the netdev from lport, fcoe_get_lesb is the common code for the LLDs. Signed-off-by: NBhanu Prakash Gollapudi <bprakash@broadcom.com> Acked-by: NYi Zou <yi.zou@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
- 03 10月, 2011 2 次提交
-
-
由 Vasu Dev 提交于
Currently fcoe_ddp_min doesn't have default value so by default not used, so setting up default value as 4k as this works better by avoiding overhead of programing DDP for small IOs. Signed-off-by: NVasu Dev <vasu.dev@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NYi Zou <yi.zou@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-
由 Vasu Dev 提交于
Use real dev in case it has HW vlan acceleration support since in this case the real dev would do needed vlan processing, this way unnecessary vlan layer processing avoided and it gives slightly better IOPS with 512B size IOs. Signed-off-by: NVasu Dev <vasu.dev@intel.com> Tested-by: NRoss Brattain <ross.b.brattain@intel.com> Signed-off-by: NYi Zou <yi.zou@intel.com> Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
-