- 11 3月, 2008 1 次提交
-
-
由 Steve Wise 提交于
cm_work_handler() can access cm_id_priv after it drops its reference by calling iwch_deref_id(), which might cause it to be freed. The fix is to look at whether IWCM_F_CALLBACK_DESTROY is set _before_ dropping the reference. Then if it was set, free the cm_id on this thread. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 01 3月, 2008 3 次提交
-
-
由 Pete Wyckoff 提交于
Commit a3cd7d90 ("IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs") caused a regression for iSER and was reverted in e5507736. This change attempts to redo the original patch so that all used FMR entries are flushed when ib_flush_fmr_pool() is called without affecting the normal FMR pool cleaning thread. Simply move used entries from the clean list onto the dirty list in ib_flush_fmr_pool() before letting the cleanup thread do its job. Signed-off-by: NPete Wyckoff <pw@osc.edu> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Pete Wyckoff 提交于
This reverts commit a3cd7d90. The original commit breaks iSER reliably, making it complain: iser: iser_reg_page_vec:ib_fmr_pool_map_phys failed: -11 The FMR cleanup thread runs ib_fmr_batch_release() as dirty entries build up. This commit causes clean but used FMR entries also to be purged. During that process, another thread can see that there are no free FMRs and fail, even though there should always have been enough available. Signed-off-by: NPete Wyckoff <pw@osc.edu> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Sean Hefty 提交于
When a CM MAD is received, it is queued to a CM workqueue for processing. The queued work item references the port and device on which the MAD was received. If that device is removed from the system before the work item can execute, the work item will reference freed memory. To fix this, flush the workqueue after unregistering to receive MAD, and before the device is be freed. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 16 2月, 2008 1 次提交
-
-
由 Li Zefan 提交于
If kobject_create_and_add() fails and returns NULL, the current code in ib_device_register_sysfs() does not set ret and hence returns 0. Set ret to -ENOMEM for this failure, so that the caller knows that ib_device_register_sysfs() actually failed. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 15 2月, 2008 1 次提交
-
-
由 Sean Hefty 提交于
There's an undesirable interaction with issuing MRA requests to increase connection timeouts and the listen backlog. When the rdma_cm receives a connection request, it queues an MRA with the ib_cm. (The ib_cm will send an MRA if it receives a duplicate REQ.) The rdma_cm will then create a new rdma_cm_id and give that to the user, which in this case is the rdma_user_cm. If the listen backlog maintained in the rdma_user_cm is full, it destroys the rdma_cm_id, which in turns destroys the ib_cm_id. The ib_cm_id generates a REJ because the state of the ib_cm_id has changed to MRA sent, versus REQ received. When the backlog is full, we just want to drop the REQ so that it is retried later. Fix this by deferring queuing the MRA until after the user of the rdma_cm has examined the connection request. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 13 2月, 2008 2 次提交
-
-
由 Roland Dreier 提交于
Commit 9af57b7a ("IB/cm: Add basic performance counters") introduced a bug in how the reference count for cm_class.subsys.kobj was handled: the path that released a device did a kobject_put() on that kobject, but there was no kobject_get() in the path the handles adding a device. So the reference count ended up too low, which leads to bad things. Fix up and simplify the reference counting to avoid this. (Actually, I introduced the bug when fixing the patch up to match some of Greg's kobject changes, but who's counting) Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
Pesky little devils, sneaking around... Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 05 2月, 2008 2 次提交
-
-
由 Or Gerlitz 提交于
Allocate memory for the page_list field of struct ib_pool_fmr only when caching is enabled for the FMR pool, since the field is not used otherwise. This can save significant amounts of memory for large pools with caching turned off. Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Sean Hefty 提交于
Paths with hop_limit > 1 indicate that the connection will be routed between IB subnets. Update the subnet local field in the CM REQ based on the hop_limit value. In addition, if the path is routed, then set the LIDs in the REQ to the permissive LIDs. This is used to indicate to the passive side that it should use the LIDs in the received local route header (LRH) associated with the REQ when programming the QP. This is a temporary work-around to the IB CM to support IB router development until the IB router specification is completed. It is not anticipated that this work-around will cause any interoperability issues with existing stacks or future stacks that will properly support IB routers when defined. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 29 1月, 2008 3 次提交
-
-
由 Denis V. Lunev 提交于
Needed to propagate it down to the ip_route_output_flow. Signed-off-by: NDenis V. Lunev <den@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Denis V. Lunev 提交于
in_dev_find() need a namespace to pass it to fib_get_table(), so add an argument. Signed-off-by: NDenis V. Lunev <den@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Perches 提交于
Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 1月, 2008 14 次提交
-
-
由 Olaf Kirch 提交于
When a FMR is released via ib_fmr_pool_unmap(), the FMR usually ends up on the free_list rather than the dirty_list (because we allow a certain number of remappings before actually requiring a flush). However, ib_fmr_batch_release() only looks at dirty_list when flushing out old mappings. This means that when ib_fmr_pool_flush() is used to force a flush of the FMR pool, some dirty FMRs that have not reached their maximum remap count will not actually be flushed. Fix this by flushing all FMRs that have been used at least once in ib_fmr_batch_release(). Signed-off-by: NOlaf Kirch <olaf.kirch@oracle.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Olaf Kirch 提交于
Normally, the serial numbers for flush requests and flushes executed for an FMR pool should be in sync. However, if the FMR pool flushes dirty FMRs because the dirty_watermark was reached, we wake up the cleanup thread and let it do its stuff. As a side effect, the cleanup thread increments pool->flush_ser, which leaves it one higher than pool->req_ser. The next time the user calls ib_flush_fmr_pool(), the cleanup thread will be woken up, but ib_flush_fmr_pool() won't wait for the flush to complete because flush_ser is already past req_ser. This means the FMRs that the user expects to be flushed may not have all been flushed when the function returns. Fix this by telling the cleanup thread to do work exclusively by incrementing req_ser, and by moving the comparison of dirty_len and dirty_watermark into ib_fmr_pool_unmap(). Signed-off-by: NOlaf Kirch <olaf.kirch@oracle.com>
-
由 Roland Dreier 提交于
In addition to being overly complex, the locking in user_mad.c is broken: there were multiple reports of deadlocks and lockdep warnings. In particular it seems that a single thread may end up trying to take the same rwsem for reading more than once, which is explicitly forbidden in the comments in <linux/rwsem.h>. To solve this, we change the locking to use plain mutexes instead of rwsems. There is one mutex per open file, which protects the contents of the struct ib_umad_file, including the array of agents and list of queued packets; and there is one mutex per struct ib_umad_port, which protects the contents, including the list of open files. We never hold the file mutex across calls to functions like ib_unregister_mad_agent(), which can call back into other ib_umad code to queue a packet, and we always hold the port mutex as long as we need to make sure that a device is not hot-unplugged from under us. This even makes things nicer for users of the -rt patch, since we remove calls to downgrade_write() (which is not implemented in -rt). Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Sean Hefty 提交于
By default, the responder_resources parameter is set to that received in a connection request. The passive side may override this value when accepting the connection. Use the value provided by the passive side when transitioning the QP to RTR state, rather than the value given in the connect request. Without this change, the RTR transition may fail if the passive side supports fewer responder_resources than that in the request. For code consistency and to protect against QP destruction, restructure overriding initiator_depth to match how responder_resources is set. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Rolf Manderscheid 提交于
An IPoIB subnet on an IB fabric that spans multiple IB subnets can't use link-local scope in multicast GIDs. The existing routines that map IP/IPv6 multicast addresses into IB link-level addresses hard-code the scope to link-local, and they also leave the partition key field uninitialised. This patch adds a parameter (the link-level broadcast address) to the mapping routines, allowing them to initialise both the scope and the P_Key appropriately, and fixes up the call sites. The next step will be to add a way to configure the scope for an IPoIB interface. Signed-off-by: NRolf Manderscheid <rvm@obsidianresearch.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Sean Hefty 提交于
This is based on user feedback from Doug Ledford at RedHat: Events that occur on an rdma_cm_id are reported to userspace through an event channel. Connection request events are reported on the event channel associated with the listen. When the connection is accepted, a new rdma_cm_id is created and automatically uses the listen event channel. This is suboptimal where the user only wants listen events on that channel. Additionally, it may be desirable to have events related to connection establishment use a different event channel than those related to already established connections. Allow the user to migrate an rdma_cm_id between event channels. All pending events associated with the rdma_cm_id are moved to the new event channel. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Vladimir Sokolovsky 提交于
Enable conn_id remove on the passive side after connection establishment. This corrects an issue where the IB driver can't be unloaded after running applications over RDS. The 'dev_remove' counter does not reach 0 for established connections on the passive side. This problem is limited to device removal, and only occurs on the passive side if there are established connections. Signed-off-by: NVladimir Sokolovsky <vlad@mellanox.co.il> Reviewed-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Sean Hefty 提交于
In cancel_mads(), MADs are moved from the wait_list and local_list to a cancel_list for processing. However, the structures on these two lists are not the same. The wait_list references struct ib_mad_send_wr_private, but local_list references struct ib_mad_local_private. Cancel_mads() treats all items moved to the cancel_list as struct ib_mad_send_wr_private. This leads to a system crash when requests are moved from the local_list to the cancel_list. Fix this by leaving local_list alone. All requests on the local_list have completed are just awaiting processing by a queued worker thread. Bug (crash) reported by Dotan Barak <dotanb@dev.mellanox.co.il>. Problem with local_list access reported by Robert Reynolds <rreynolds@opengridcomputing.com>. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Sean Hefty 提交于
Add performance/debug counters to track sent/received messages, retries, and duplicates. Counters are tracked per CM message type, per port. The counters are always enabled, so intrusive state tracking is not done. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Sean Hefty 提交于
To allow ULPs to tune timeout values and capture retry statistics, report the number of times that a mad send operation was retried. For RMPP mads, report the total number of times that the any portion (send window) of the send operation was retried. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Sean Hefty 提交于
P_key changes can invalidate multicast groups. Report errors on all multicast groups affected by a pkey change. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Steve Welch 提交于
The local loopback of an outgoing DR SMP response is limited to those that originate at the driver specific SMA implementation during the driver specific process_mad() function. This patch enables a returning DR SMP originating in userspace (or elsewhere) to be delivered to the local managment stack. In this specific case the driver process_mad() function does not consume or process the MAD, so a reponse mad has not be created and the original MAD must manually be copied to the MAD buffer that is to be handed off to the local agent. Signed-off-by: NSteve Welch <swelch@systemfabricworks.com> Acked-by: NHal Rosenstock <hal@xsigo.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Ralph Campbell 提交于
In ib_mad_recv_done_handler(), the response pointer is checked for NULL after allocating it. It is then checked again in the local process_mad() path but there is no possibility of it changing in between. Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com> Acked-by: NHal Rosenstock <hal@xsigo.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Steve Wise 提交于
Set the initiator depth and responder resources to the device max values for new connect request events in the iWARP connection manager. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 25 1月, 2008 2 次提交
-
-
由 Greg Kroah-Hartman 提交于
There is no need for kobject_unregister() anymore, thanks to Kay's kobject cleanup changes, so replace all instances of it with kobject_put(). Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Greg Kroah-Hartman 提交于
Stop using kobject_register, as this way we can control the sending of the uevent properly, after everything is properly initialized. Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <mshefty@ichips.intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 31 10月, 2007 1 次提交
-
-
由 Anton Blanchard 提交于
I noticed my machine was at a constant load average of 1. This was because ib_create_fmr_pool calls kthread_create but does not immediately wake the thread up. Change to using kthread_run so we enter ib_fmr_cleanup_thread(), set TASK_INTERRUPTIBLE, then go to sleep. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 24 10月, 2007 1 次提交
-
-
由 Jens Axboe 提交于
Most drivers need to set length and offset as well, so may as well fold those three lines into one. Add sg_assign_page() for those two locations that only needed to set the page, where the offset/length is set outside of the function context. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 23 10月, 2007 1 次提交
-
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 20 10月, 2007 1 次提交
-
-
由 Roland Dreier 提交于
Commit 9ead190b ("IB/uverbs: Don't serialize with ib_uverbs_idr_mutex") rewrote how userspace objects are looked up in the uverbs module's idrs, and introduced a severe bug in the process: there is no checking that an operation is being performed by the right process any more. Fix this by adding the missing check of uobj->context in __idr_get_uobj(). Apparently everyone is being very careful to only touch their own objects, because this bug was introduced in June 2006 in 2.6.18, and has gone undetected until now. Cc: stable <stable@kernel.org> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 19 10月, 2007 1 次提交
-
-
由 Anton Arapov 提交于
There is a justifying patch for Stephen's patches. Stephen's patches disallows using a port range of one single port and brakes the meaning of the 'remaining' variable, in some places it has different meaning. My patch gives back the sense of 'remaining' variable. It should mean how many ports are remaining and nothing else. Also my patch allows using a single port. I sure we must be able to use mentioned port range, this does not restricted by documentation and does not brake current behavior. usefull links: Patches posted by Stephen Hemminger http://marc.info/?l=linux-netdev&m=119206106218187&w=2 http://marc.info/?l=linux-netdev&m=119206109918235&w=2 Andrew Morton's comment http://marc.info/?l=linux-kernel&m=119248225007737&w=2 1. Allows using a port range of one single port. 2. Gives back sense of 'remaining' variable. Signed-off-by: NAnton Arapov <aarapov@redhat.com> Acked-by: NStephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 10月, 2007 2 次提交
-
-
由 Sean Hefty 提交于
Deadlock condition reported by Kanoj Sarcar <kanoj@netxen.com>. The deadlock occurs when a connection request arrives at the same time that a wildcard listen is being destroyed. A wildcard listen maintains per device listen requests for each RDMA device in the system. The per device listens are automatically added and removed when RDMA devices are inserted or removed from the system. When a wildcard listen is destroyed, rdma_destroy_id() acquires the rdma_cm's device mutex ('lock') to protect against hot-plug events adding or removing per device listens. It then tries to destroy the per device listens by calling ib_destroy_cm_id() or iw_destroy_cm_id(). It does this while holding the device mutex. However, if the underlying iw/ib CM reports a connection request while this is occurring, the rdma_cm callback function will try to acquire the same device mutex. Since we're in a callback, the ib_destroy_cm_id() or iw_destroy_cm_id() calls will block until their callback thread returns, but the callback is blocked waiting for the device mutex. Fix this by re-working how per device listens are destroyed. Use rdma_destroy_id(), which avoids the deadlock, in place of cma_destroy_listen(). Additional synchronization is added to handle device hot-plug events and ensure that the id is not destroyed twice. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Sean Hefty 提交于
If a user allocates a QP on an rdma_cm_id, the rdma_cm will automatically transition the QP through its states (RTR, RTS, error, etc.) While the QP state transitions are occurring, the QP itself must remain valid. Provide locking around the QP pointer to prevent its destruction while accessing the pointer. This fixes an issue reported by Olaf Kirch from Oracle that resulted in a system crash: "An incoming connection arrives and we decide to tear down the nascent connection. The remote ends decides to do the same. We start to shut down the connection, and call rdma_destroy_qp on our cm_id. ... Now apparently a 'connect reject' message comes in from the other host, and cma_ib_handler() is called with an event of IB_CM_REJ_RECEIVED. It calls cma_modify_qp_err, which for some odd reason tries to modify the exact same QP we just destroyed." Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 13 10月, 2007 1 次提交
-
-
由 Kay Sievers 提交于
This changes the uevent buffer functions to use a struct instead of a long list of parameters. It does no longer require the caller to do the proper buffer termination and size accounting, which is currently wrong in some places. It fixes a known bug where parts of the uevent environment are overwritten because of wrong index calculations. Many thanks to Mathieu Desnoyers for finding bugs and improving the error handling. Signed-off-by: NKay Sievers <kay.sievers@vrfy.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 11 10月, 2007 1 次提交
-
-
由 Stephen Hemminger 提交于
Expansion of original idea from Denis V. Lunev <den@openvz.org> Add robustness and locking to the local_port_range sysctl. 1. Enforce that low < high when setting. 2. Use seqlock to ensure atomic update. The locking might seem like overkill, but there are cases where sysadmin might want to change value in the middle of a DoS attack. Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 10月, 2007 2 次提交
-
-
由 Sean Hefty 提交于
Automatically queue MRA message to decrease the number of retries sent by the remote side during connection establishment. This also has the effect of increasing the overall connection timeout without using a longer retry time in the case of dropped packets. Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Sean Hefty 提交于
The IB CM provides a message received acknowledged (MRA) message that can be sent to indicate that a REQ or REP message has been received, but will require more time to process than the timeout specified by those messages. In many cases, the application may not know how long it will take to respond to a CM message, but the majority of the time, it will usually respond before a retry has been sent. Rather than sending an MRA in response to all messages just to handle the case where a longer timeout is needed, it is more efficient to queue the MRA for sending in case a duplicate message is received. This avoids sending an MRA when it is not needed, but limits the number of times that a REQ or REP will be resent. It also provides for a simpler implementation than generating the MRA based on a timer event. (That is, trying to send the MRA after receiving the first REQ or REP if a response has not been generated, so that it is received at the remote side before a duplicate REQ or REP has been received) Signed-off-by: NSean Hefty <sean.hefty@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-