- 24 9月, 2009 1 次提交
-
-
由 Roland Dreier 提交于
Holding agent->lock across cancel_delayed_work() (which does del_timer_sync()) in ib_cancel_rmpp_recvs() leads to lockdep reports of possible lock-timer deadlocks if a consumer ever does something that connects agent->lock to a lock taken in IRQ context (cf http://marc.info/?l=linux-rdma&m=125243699026045). Fix this by changing the list items to a new state "CANCELING" while holding the lock, and then canceling the delayed work without holding the lock. If the delayed work runs after the lock is dropped, it will see the state is CANCELING and return immediately, so the list will stay stable while we traverse it with the lock not held. Reviewed-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 10 9月, 2009 3 次提交
-
-
由 Steve Wise 提交于
If the cm_id of a connect request is destroyed prior to the ULP accepting or rejecting the connection, then the provider never cleans up the connection. The iwcm should explicitly reject these connections if the cm_id is destroyed. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Steve Wise 提交于
FW mismatches can cause a crash in the iw_cxgb3 event handler. - NULL the t3cdev->ulp pointer on failures in cxio_rdev_open() - Silently ignore events when the ulp ptr is NULL in iwch_err_handler() Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Steve Wise 提交于
Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 07 9月, 2009 2 次提交
-
-
由 Hal Rosenstock 提交于
MADs are UD and can be dropped if there are no receives posted, so allow receive queue size to be set with a module parameter in case the queue needs to be lengthened. Send side tuning is done for symmetry with receive. Signed-off-by: NHal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
Lockdep reported a possible deadlock with cm_id_priv->lock, mad_agent_priv->lock and mad_agent_priv->timed_work.timer; this happens because the mad module does cancel_delayed_work(&mad_agent_priv->timed_work); while holding mad_agent_priv->lock. cancel_delayed_work() internally does del_timer_sync(&mad_agent_priv->timed_work.timer). This can turn into a deadlock because mad_agent_priv->lock is taken inside cm_id_priv->lock, so we can get the following set of contexts that deadlock each other: A: holding cm_id_priv->lock, waiting for mad_agent_priv->lock B: holding mad_agent_priv->lock, waiting for del_timer_sync() C: interrupt during mad_agent_priv->timed_work.timer that takes cm_id_priv->lock Fix this by using the new __cancel_delayed_work() interface (which internally does del_timer() instead of del_timer_sync()) in all the places where we are holding a lock. Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757Reported-by: NBart Van Assche <bart.vanassche@gmail.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 06 9月, 2009 34 次提交
-
-
由 Chien Tung 提交于
Old query_port code reports static MTU and link state values. Instead, map actual MTU to next largest IB_MTU_* constant and correctly report link state. Cc: Steve Wise <swise@opengridcomputing.com> Reported-by: NJeff Squyres <jsquyres@cisco.com> Signed-off-by: NChien Tung <chien.tin.tung@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Don Wood 提交于
The disconn routine has been reworked to acoomodate the terminate and flushing changes. The routine has been reorganized to make all the decisions at the start then it performs all the required operations. This simplified the lock handling and is easier to follow. Signed-off-by: NDon Wood <donald.e.wood@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Don Wood 提交于
Use the flush status to fill in cqe status when a specific error has been identified. Subsequent flushed completions still use the flushed value. Signed-off-by: NDon Wood <donald.e.wood@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Don Wood 提交于
When a flush request is given to the hw, it will place one cqe marked as flushed (unless there is nothing to flush). An application that is waiting for all wqe's to complete will be left hanging. This modifies poll_cq to return the correct number of flushes for the pending elements on the wq. Signed-off-by: NDon Wood <donald.e.wood@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Don Wood 提交于
When an asynchronous event occurs that requires a terminate, it is sometimes possible to identify the wqe in error. This change uses flush to get this information to the poll routine. The flush operation puts the status into the cqe. If this information is not available, it continues to use the more generic flush code as before. Signed-off-by: NDon Wood <donald.e.wood@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Don Wood 提交于
Implement the sending and receiving of Terminate packets. Signed-off-by: NDon Wood <donald.e.wood@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Don Wood 提交于
CQ errors are not being handled correctly. Put in the the upcall for CQ errors. Signed-off-by: NDon Wood <donald.e.wood@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Don Wood 提交于
When a QP is destroyed, unprocessed CQ entries could still reference the QP. This change zeroes the context value at QP destroy time. By skipping over cqe's with a zero context, poll_cq no longer processes a cqe for a destroyed QP. Signed-off-by: NDon Wood <donald.e.wood@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Don Wood 提交于
The routine to allocate a cqp request is not called from process context code. Since it is not OK to sleep, it needs to use GFP_ATOMIC not GFP_KERNEL. Signed-off-by: NDon Wood <donald.e.wood@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Don Wood 提交于
The code currently has a work structure in the QP. This requires a lock and a pending flag to ensure there is never more than one request active. When two events happen quickly (such as FIN and LLP CLOSE), it causes unnecessary timeouts since the second one is dropped. This fix allocates memory for the work request so the second one can be queued. A lock is removed since it is no longer needed. Signed-off-by: NDon Wood <donald.e.wood@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Don Wood 提交于
During termination, it is possible for the refcnt to go to zero while the worker thread is posting events upward. This fix increments the refcnt before the request is passed to the worker thread. The thread decrements the refcnt when the request is completed. Signed-off-by: NDon Wood <donald.e.wood@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Jack Morgenstein 提交于
Userspace apps are supposed to release all ib device resources if they receive a fatal async event (IBV_EVENT_DEVICE_FATAL). However, the app has no way of knowing when the device has come back up, except to repeatedly attempt ibv_open_device() until it succeeds. However, currently there is no protection against the open succeeding while the device is in being removed following the fatal event. In this case, the open will succeed, but as a result the device waits in the middle of its removal until the new app releases its resources -- and the new app will not do so, since the open succeeded at a point following the fatal event generation. This patch adds an "active" flag to the device. The active flag is set to false (in the fatal event flow) before the "fatal" event is generated, so any subsequent ibv_dev_open() call to the device will fail until the device comes back up, thus preventing the above deadlock. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Arputham Benjamin 提交于
When the mthca driver uses the same name for interrupts for every device in the system. This can make it very confusing trying to work out exactly which device MSI-X interrupts are for. Change the driver to add the PCI name of the device to the interrupt name. Signed-off-by: NArputham Benjamin <abenjamin@sgi.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
mthca_ib_lock_cqs()/mthca_ib_unlock_cqs() are helper functions that lock/unlock both CQs attached to a QP in the proper order to avoid AB-BA deadlocks. Annotate this so sparse can understand what's going on (and warn us if we misuse these functions). Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
mthca_reset.c doesn't have any function annotations, so there's no reason to include <linux/init.h>. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
mthca_config_reg.h was including <asm/page.h> for no reason -- the whole file is just defines of constants, so it's entirely self-contained. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Jack Morgenstein 提交于
Userspace apps are supposed to release all ib device resources if they receive a fatal async event (IBV_EVENT_DEVICE_FATAL). However, the app has no way of knowing when the device has come back up, except to repeatedly attempt ibv_open_device() until it succeeds. However, currently there is no protection against the open succeeding while the device is in being removed following the fatal event. In this case, the open will succeed, but as a result the device waits in the middle of its removal until the new app releases its resources -- and the new app will not do so, since the open succeeded at a point following the fatal event generation. This patch adds an "active" flag to the device. The active flag is set to false (in the fatal event flow) before the "fatal" event is generated, so any subsequent ibv_dev_open() call to the device will fail until the device comes back up, thus preventing the above deadlock. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
mlx4_ib_lock_cqs()/mlx4_ib_unlock_cqs() are helper functions that lock/unlock both CQs attached to a QP in the proper order to avoid AB-BA deadlocks. Annotate this so sparse can understand what's going on (and warn us if we misuse these functions). Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roel Kluin 提交于
dev->ibdev.iwcm allocation may fail, prevent a dereference. Signed-off-by: NRoel Kluin <roel.kluin@gmail.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Jack Morgenstein 提交于
Since the original commit 883a99c7 ("[IB] uverbs: Add a mask of device methods allowed for userspace"), the uverbs core returns EINVAL for commands not implemented by a specific low-level driver. This creates a problem that there is no way to tell the difference between an unimplemented command and an implemented one which is incorrectly invoked (which also returns EINVAL). The fix is to have unimplemented commands return ENOSYS. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Yossi Etigin 提交于
Until now, retries were only sent when joining a multicast group. This patch will adds retries when leaving a multicast group as well. Signed-off-by: NRon Livne <ronli@voltaire.com> Signed-off-by: NYossi Etigin <yosefe@voltaire.com> Acked-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Marcin Slusarz 提交于
Replace open-coded reimplementations with printk_once(). Signed-off-by: NMarcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Tobias Klauser 提交于
Use the %pM conversion specifier to print a MAC address. Signed-off-by: NTobias Klauser <klto@zhaw.ch> Acked-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
Rather than just defining static spinlock_t variables and then initializing them later in init functions, simply define them with DEFINE_SPINLOCK() and remove the calls to spin_lock_init(). This cleans up the source a tad and also shrinks the compiled code; eg on x86-64: add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-40 (-40) function old new delta ib_uverbs_init 336 326 -10 ib_mad_init_module 147 137 -10 ib_sa_init 123 103 -20 Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
The hop count field in a directed route MAD is only allowed to be in the range 0 to 63 (by spec). Check that this really is the case to avoid accessing outside the bounds of the hop array. Reported-by: NRoel Kluin <roel.kluin@gmail.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Jason Gunthorpe 提交于
Check that the format of multicast link addresses is correct before taking them from dev->mc_list to priv->multicast_list. This way we never try to send a bogus address to the SA, which prevents badness from erronous 'ip maddr addr add', broken bonding drivers, etc. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
IPoIB currently must use irqsave locking for priv->lock, since it is taken from interrupt context in one path. However, ipoib_send() does skb_orphan(), and the network stack locking is not IRQ-safe. Therefore we need to make sure we don't hold priv->lock when calling ipoib_send() to avoid lockdep warnings (the code was almost certainly safe in practice, since the only code path that takes priv->lock from interrupt context would never call into the network stack). Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757Reported-by: NBart Van Assche <bart.vanassche@gmail.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roel Kluin 提交于
strlcpy() will always null terminate the string. node_desc is not guaranteed to be NUL-terminated so just use memcpy(). Signed-off-by: NRoel Kluin <roel.kluin@gmail.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Joachim Fenkes 提交于
The driver was reporting CQE flags in the wrong bit positions, causing consumers to miss incoming immediate data. Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Joachim Fenkes 提交于
The old code used a lot of hard-coded values, which might not be valid in all environments (especially routed fabrics or partitioned subnets). Copy as much information as possible from the incoming request to correct that. Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Alexander Schmidt 提交于
Make port autodetect mode the default for the ehca driver. The autodetect code has been in the kernel for several releases now and has proved to be stable. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Steve Wise 提交于
A close/abort while waiting for a wr_ack during connection migration can cause a hung process in iwch_accept_cr/iwch_reject_cr. The fix is to set rpl_error/rpl_done and wake up the waiters when we get a close/abort while in MPA_REQ_RCVD state. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Steve Wise 提交于
- Keep ref on connection request endpoints until either accepted or rejected so it doesn't get freed early. - Endpoint flags now need to be set via atomic bitops because they can be set on both the iw_cxgb3 workqueue thread and user disconnect threads. - Don't move out of CLOSING too early due to multiple calls to iwch_ep_disconnect. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-