- 06 9月, 2009 5 次提交
-
-
由 Jack Morgenstein 提交于
Userspace apps are supposed to release all ib device resources if they receive a fatal async event (IBV_EVENT_DEVICE_FATAL). However, the app has no way of knowing when the device has come back up, except to repeatedly attempt ibv_open_device() until it succeeds. However, currently there is no protection against the open succeeding while the device is in being removed following the fatal event. In this case, the open will succeed, but as a result the device waits in the middle of its removal until the new app releases its resources -- and the new app will not do so, since the open succeeded at a point following the fatal event generation. This patch adds an "active" flag to the device. The active flag is set to false (in the fatal event flow) before the "fatal" event is generated, so any subsequent ibv_dev_open() call to the device will fail until the device comes back up, thus preventing the above deadlock. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Arputham Benjamin 提交于
When the mthca driver uses the same name for interrupts for every device in the system. This can make it very confusing trying to work out exactly which device MSI-X interrupts are for. Change the driver to add the PCI name of the device to the interrupt name. Signed-off-by: NArputham Benjamin <abenjamin@sgi.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
mthca_ib_lock_cqs()/mthca_ib_unlock_cqs() are helper functions that lock/unlock both CQs attached to a QP in the proper order to avoid AB-BA deadlocks. Annotate this so sparse can understand what's going on (and warn us if we misuse these functions). Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
mthca_reset.c doesn't have any function annotations, so there's no reason to include <linux/init.h>. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
mthca_config_reg.h was including <asm/page.h> for no reason -- the whole file is just defines of constants, so it's entirely self-contained. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 24 6月, 2009 1 次提交
-
-
由 Alexander Schmidt 提交于
Increment version number for DMEM toleration. Signed-off-by: NAlexander Schmidt <alexs@linux.vnet.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 23 6月, 2009 5 次提交
-
-
由 Roland Dreier 提交于
dma_sync_single() is deprecated now, and the use in mthca is wrong: there should be a dma_sync_single_for_cpu() before touching the memory from the CPU, and a dma_sync_single_for_device() afterwards. Fix this, prompted by a kick in the pants from a patch from FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Faisal Latif 提交于
During cluster testing, one QP was not closed, as FIN is not handled properly when its rexmit count expires or in some cases when RST is is received after sending FIN. The reason is that the cm_id does not get decremented under these conditions. Signed-off-by: NFaisal Latif <faisal.latif@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Faisal Latif 提交于
In nes_query_device(), max_qp_init_rd_atom is incorrectly set to max_qp_wr. This was found when a test application had a dapl async event error. Signed-off-by: NFaisal Latif <faisal.latif@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roel Kluin 提交于
This prevents the memcpy() of a guid_entries element using a negative index. Signed-off-by: NRoel Kluin <roel.kluin@gmail.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Hannes Hering 提交于
Implement toleration of dynamic memory operations and 16 GB gigantic pages, where "toleration" means that the driver can cope with dynamic memory operations that happen before the driver is loaded. While the ehca driver is loaded, dynamic memory operations are still prohibited by returning NOTIFY_BAD from the memory notifier. On module load the driver walks through available system memory, checks for available memory ranges and then registers the kernel internal memory region accordingly. The translation of address ranges is implemented via a 3-level busmap. Signed-off-by: NHannes Hering <hering2@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 16 6月, 2009 1 次提交
-
-
由 Greg Kroah-Hartman 提交于
In the near future, the driver core is going to not allow direct access to the driver_data pointer in struct device. Instead, the functions dev_get_drvdata() and dev_set_drvdata() should be used. These functions have been around since the beginning, so are backwards compatible with all older kernel versions. Cc: Sean Hefty <sean.hefty@intel.com> Cc: Roland Dreier <rolandd@cisco.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: general@lists.openfabrics.org Cc: Christoph Raisch <raisch@de.ibm.com> Acked-by: NHoang-Nam Nguyen <hnguyen@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 14 6月, 2009 1 次提交
-
-
由 Roland Dreier 提交于
When both MSI-X and legacy INTx fail to generate an interrupt, the driver frees the MSI-X interrupts twice. Fix this by clearing the have_irq flag for the MSI-X interrupts when they are freed the first time. Reported-by: NYinghai Lu <yhlu.kernel@gmail.com> Tested-by: NYinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 06 6月, 2009 1 次提交
-
-
由 Jack Morgenstein 提交于
The ConnectX Programmer's Reference Manual states that the "SO" bit must be set when posting Fast Register and Local Invalidate send work requests. When this bit is set, the work request will be executed only after all previous work requests on the send queue have been executed. (If the bit is not set, Fast Register and Local Invalidate WQEs may begin execution too early, which violates the defined semantics for these operations) This fixes the issue with NFS/RDMA reported in <http://lists.openfabrics.org/pipermail/general/2009-April/059253.html> Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Cc: <stable@kernel.org> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 04 6月, 2009 1 次提交
-
-
由 Joachim Fenkes 提交于
All the fields in the control block are nicely right-aligned, so no masking is necessary. Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 28 5月, 2009 3 次提交
-
-
由 Steve Wise 提交于
T3 firmware only supports one WRs worth of page list for fast register work requests. The driver currently allows 2 WRs worth, which doesn't work for T3, so reduce the limit in the driver. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Steve Wise 提交于
Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eli Cohen 提交于
The current MTT allocator uses kmalloc() to allocate a buffer for its buddy allocator, and thus is limited in the amount of MTT segments that it can control. As a result, the size of memory that can be registered is limited too. This patch uses a module parameter to control the number of MTT entries that each segment represents, allowing more memory to be registered with the same number of segments. Signed-off-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 16 5月, 2009 1 次提交
-
-
由 Roel Kluin 提交于
With a postfix increment, i is incremented one past 10K/5K before the loop ends, so the error messages will be displayed too soon if the test succeeds on the last iteration. Fix the comparisons to be > instead of >=. Signed-off-by: NRoel Kluin <roel.kluin@gmail.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 14 5月, 2009 5 次提交
-
-
由 Jack Stone 提交于
Remove uneeded casts of void *. Signed-off-by: NJack Stone <jwjstone@fastmail.fm> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Stefan Roscher 提交于
Signed-off-by: NStefan Roscher <stefan.roscher@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Stefan Roscher 提交于
The queue map for flush completion circumvention is only used for kernel space queue pairs. This patch skips the allocation of the queue maps in case the QP is created for userspace. In addition, this patch does not iomap the galpas for kernel usage if the queue pair is only used in userspace. These changes will improve the performance of creation of userspace queue pairs. Signed-off-by: NStefan Roscher <stefan.roscher@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Stefan Roscher 提交于
In case of large queue pairs there is the possibillity of allocation failures due to memory fragmentation when using kmalloc(). To ensure the memory is allocated even if kmalloc() can not find chunks which are big enough, we fall back to allocating the memory with vmalloc(). Signed-off-by: NStefan Roscher <stefan.roscher@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Anton Blanchard 提交于
To improve performance of driver resource allocation, replace vmalloc() calls with kmalloc(). Signed-off-by: NStefan Roscher <stefan.roscher@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 09 5月, 2009 1 次提交
-
-
由 Al Viro 提交于
forgot to unlock superblock before calling deactivate_super()... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 08 5月, 2009 1 次提交
-
-
由 Jack Morgenstein 提交于
The low-level mlx4 driver modified the page-list addresses for fast register work requests post send to big-endian, and set a "present" bit. This caused problems later when the consumer attempted to unmap the pages using the page-list (using the list addresses which were assumed to be still in CPU-endian order). Fix the mlx4 driver to allocate two buffers and use a private buffer for the hardware-format bus addresses. This patch fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1571>, an NFS/RDMA server crash. The cause of the crash was found by Vu Pham of Mellanox. The fix is along the lines suggested by Steve Wise in comment #21 in bug 1571. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 30 4月, 2009 1 次提交
-
-
由 Steve Wise 提交于
When the SQ is flushed, mark the flushed entries as not signaled so the poll logic doesn't re-insert the CQ entry thinking its an out of order completion. The bug can cause the NFS/RDMA server to crash due to processing the same completed work request twice. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 28 4月, 2009 12 次提交
-
-
由 Chien Tung 提交于
Update version number to 1.5.0.0 Signed-off-by: NChien Tung <chien.tin.tung@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Faisal Latif 提交于
If reg_phys_mem() fails, we need to free memory allocated for MPA frame with private data before returning the error. Also move nes_add_ref() after the reg_phys_mem() is successful. Signed-off-by: NFaisal Latif <faisal.latif@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Faisal Latif 提交于
Running large cluster setup, we are hanging after many hours of testing. Fixing this required going over the code and making sure the rexmit entry was properly removed based on the cm_node's state and packet received. Also when receiving a FIN packet, check seq# and make sure there were no errors before calling handle_fin(). Following are the changes done in nes_cm.c: * handle_ack_pkt() needs to return error value, so in case of error, handle_fin() is not called. Some cleanup done while going over the code. * handle_rst_pkt(), handling of cm_node's NES_CM_STATE_LAST_ACK is missing. * process_packet(), in case of FIN only packet is received, call check_seq() before processing. * in handle_fin_pkt(), we are calling cleanup_retrans_entry() for all conditions, even if the packets need to be dropped. Signed-off-by: NFaisal Latif <faisal.latif@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Faisal Latif 提交于
Under heavy load with large cluster testing, it may take longer to receive a response to MPA requests. Change the driver to wait longer after each rexmit to max time value. Signed-off-by: NFaisal Latif <faisal.latif@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Faisal Latif 提交于
check_seq() was not checking if the seq#s have wrapped. Fix it. Signed-off-by: NFaisal Latif <faisal.latif@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Faisal Latif 提交于
When a connect request comes, apbvt should only be set for non-loopback connections. Signed-off-by: NFaisal Latif <faisal.latif@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Chien Tung 提交于
Remove the NES_DEBUG that is causing the compile warning about an unused variable when INFINIBAND_NES_DEBUG is not enabled. Signed-off-by: NChien Tung <chien.tin.tung@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Chien Tung 提交于
/sys/class/infiniband/nes?/fw_ver is not displaying firmware version properly (it shows 0.0.0 with the current code). Fill in the correct firmware version number. Signed-off-by: NChien Tung <chien.tin.tung@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Chien Tung 提交于
With updated PHY firmware for SFP_D, setting the trace length to 1 inch for SFP_D provides a more stable link. Signed-off-by: NChien Tung <chien.tin.tung@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Chien Tung 提交于
Enable repause timer for port 1. Without this setting, under stress, the chip may misbehave. Signed-off-by: NChien Tung <chien.tin.tung@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Chien Tung 提交于
In commit 1b949324 ("RDMA/nes: Fix SFP+ PHY initialization") there is a mistake in the clean up code that removed port 1 CDR loop filter settings for 10G cards other than CX4. Put the correct setting back for appropriate PHY types. Signed-off-by: NChien Tung <chien.tin.tung@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Chien Tung 提交于
Change thermo mitigation code to flip the SerDes1 reference clock to internal, to match the change in commit a4849fc1 ("RDMA/nes: Add wide_ppm_offset parm for switch compatibility"). Signed-off-by: NChien Tung <chien.tin.tung@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 22 4月, 2009 1 次提交
-
-
由 Miroslaw Walukiewicz 提交于
In error paths where a CQ is not created, pbl is not freeed properly. In nes_destroy_cq(), add the corresponding check for nescq->mcrqf to not call nes_free_resource() when it is already done in nes_create_cq(). Signed-off-by: NMiroslaw Walukiewicz <miroslaw.walukiewicz@intel.com> Signed-off-by: NChien Tung <chien.tin.tung@intel.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-