- 08 5月, 2008 7 次提交
-
-
由 John Gregor 提交于
What's fixed: in ipath_cancel_sends() We need to unconditionally set ABORTING. So, swap the tests so the set_bit() isn't shadowed by the &&. If we've disarmed the piobufs, then we need to unconditionally set DISARMED. So, move it out from the overly protective if at the bottom. in sdma_abort_task() Abort_task was written knowing that the SDMA engine would always be reset (and restarted) on error. A recent change broke that fundamental assumption by taking the restart portion and making it conditional on a link status change. But, SDMA can go boom without a link status change in some conditions. Signed-off-by: NJohn Gregor <john.gregor@qlogic.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Dave Olson 提交于
Now that we always use PIO for vl15 on 7220, we could get stuck forever if we happened to run out of PIO buffers from the verbs code, because the setup code wouldn't run; the interrupt was also ignored if SDMA was supported. We also have to reduce the pio update threshold if we have fewer kernel buffers than the existing threshold. Clean up the initialization a bit to get ordering safer and more sensible, and use the existing ipath_chg_kernavail call to do init, rather than doing it separately. Drop unnecessary clearing of pio buffer on pio parity error. Drop incorrect updating of pioavailshadow when exitting freeze mode (software state may not match chip state if buffer has been allocated and not yet written). If we couldn't get a kernel buffer for a while, make sure we are in sync with hardware, mainly to handle the exitting freeze case. Signed-off-by: NDave Olson <dave.olson@qlogic.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Michael Albaugh 提交于
The loop in ipath_kreceive() that processes packets increments the loop-index 'i' once too often, because the exit condition does not depend on it, and is checked after the increment. By adding a check for !last to the iterator in the for loop, we correct that in a way that is not so likely to be re-broken by changes in the loop body. Signed-off-by: NMichael Albaugh <micheal.albaugh@qlogic.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Ralph Campbell 提交于
This patch fixes a bug in the RC responder which generates a completion entry with the wrong opcode when an RDMA WRITE with immediate is received. Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Dave Olson 提交于
The semantics of cancel_sends changed, but the code using it was missed. Don't leave sends and pioavail updates disabled, and add a comment as to why the force update is needed. Signed-off-by: NDave Olson <dave.olson@qlogic.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Ralph Campbell 提交于
If a send work request has immediate errors and is not put on the send queue, we shouldn't update any of the QP state. The increment of the SSN wasn't obeying this. Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Michael Albaugh 提交于
We warn about prototype chips, but the function that checks for support is also called as a result of a get_portinfo request, which can clutter the logs. Restrict warning to only appear during initialization. Signed-off-by: NMichael Albaugh <michael.albaugh@qlogic.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 07 5月, 2008 2 次提交
-
-
由 Roland Dreier 提交于
Currently, iw_cxgb3 is severely limited on the amount of userspace memory that can be registered in in a single memory region, which causes big problems for applications that expect to be able to register 100s of MB. The problem is that the driver uses a single kmalloc()ed buffer to hold the physical buffer list (PBL) for the entire memory region during registration, which means that 8 bytes of contiguous memory are required for each page of memory being registered. For example, a 64 MB registration will require 128 KB of contiguous memory with 4 KB pages, and it unlikely that such an allocation will succeed on a busy system. This is purely a driver problem: the temporary page list buffer is not needed by the hardware, so we can fix this by writing the PBL to the hardware in page-sized chunks rather than all at once. We do this by splitting the memory registration operation up into several steps: - Allocate PBL space in adapter memory for the full registration - Copy PBL to adapter memory in chunks - Allocate STag and enable memory region This also allows several other cleanups to the __cxio_tpt_op() interface and related parts of the driver. This change leaves the reregister memory region and memory window operations broken, but they already didn't work due to other longstanding bugs, so fixing them will be left to a later patch. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
Current iw_cxgb3 code adds PBL memory to the driver's gen_pool in 2 MB chunks. This limits the largest single allocation that can be done to the same size, which means that with 4 KB pages, each of which takes 8 bytes of PBL memory, the largest memory region that can be allocated is 1 GB (256K PBL entries * 4 KB/entry). Remove this limit by adding all the PBL memory in a single gen_pool chunk, if possible. Add code that falls back to smaller chunks if gen_pool_add() fails, which can happen if there is not sufficient contiguous lowmem for the internal gen_pool bitmap. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 06 5月, 2008 1 次提交
-
-
由 Stefan Roscher 提交于
Also remove duplicate assignment of local_ca_ack_delay and change min_t check for local_ca_ack_delay to u8 instead of int. Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 03 5月, 2008 3 次提交
-
-
由 Steve Wise 提交于
Testing on large clusters shows its way too short at 10 secs. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Steve Wise 提交于
Remove bad BUG_ON() that can trigger in correct operation from close_con_rpl(). It is possible to get a close_rpl message on a dead connection. The sequence is: - host refs ep for close exchange - host posts close_req - hw posts PEER_ABORT from incoming RST - host marks ep DEAD - host posts ABORT_RPL and releases ep resources - hw posts CLOSE_RPL - host derefs ep and ep freed. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Steve Wise 提交于
- Flush the QP only after the HW disables the connection. Currently we flush the QP when transitioning to CLOSING. This exposes a race condition where the HW can complete a RECV WR, for instance, -and- the SW can flush that same WR. - Only call CQ event handlers on flush IFF we actually flushed something. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 01 5月, 2008 1 次提交
-
-
由 Roland Dreier 提交于
When I merged bbf8eed1 ("IB/mlx4: Add support for resizing CQs") I changed things around so that mlx4_ib_alloc_cq_buf() and mlx4_ib_free_cq_buf() were used everywhere they could be. However, I screwed up the number of entries passed into mlx4_ib_alloc_cq_buf() in a couple places -- the function bumps the number of entries internally, so the caller shouldn't add 1 as well. Passing a too-big value for the number of entries to mlx4_ib_free_cq_buf() can cause the cleanup to go off the end of an array and corrupt allocator state in interesting ways. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 30 4月, 2008 11 次提交
-
-
由 Glenn Streiff 提交于
Various cleanups: - Change // to /* .. */ - Place whitespace around binary operators. - Trim down a few long lines. - Some minor alignment formatting for better readability. - Remove some silly tabs. Signed-off-by: NGlenn Streiff <gstreiff@neteffect.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Eric Schneider 提交于
This patch enables the iw_nes module for NetEffect RNICs to support additional PHYs including SFP+ (referred to as ARGUS in the code). Signed-off-by: NEric Schneider <eric.schneider@neteffect.com> Signed-off-by: NGlenn Streiff <gstreiff@neteffect.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Faisal Latif 提交于
Signed-off-by: Faisal Latif <flatif@neteffect.com. Signed-off-by: NGlenn Streiff <gstreiff@neteffect.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
Commit cb9fbc5c ("IB: expand ib_umem_get() prototype") changed the mthca userspace ABI to provide a way for userspace to indicate which memory regions need the DMA write barrier attribute. However, it is possible to handle this without breaking existing userspace, by having the mthca kernel driver recognize whether it is talking to old or new userspace, depending on the size of the register MR structure passed in. The only potential drawback of this is that is allows old userspace (which has a bug with DMA ordering on large SGI Altix systems) to continue to run on new kernels, but the advantage of allowing old userspace to continue to work on unaffected systems seems to outweigh this, and we can print a warning to push people to upgrade their userspace. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Olaf Kirch 提交于
When a FMR is unmapped, mthca resets the map count to 0, and clears the upper part of the R_Key which is used as the sequence counter. This poses a problem for RDS, which uses ib_fmr_unmap as a fence operation. RDS assumes that after issuing an unmap, the old R_Keys will be invalid for a "reasonable" period of time. For instance, Oracle processes uses shared memory buffers allocated from a pool of buffers. When a process dies, we want to reclaim these buffers -- but we must make sure there are no pending RDMA operations to/from those buffers. The only way to achieve that is by using unmap and sync the TPT. However, when the sequence count is reset on unmap, there is a high likelihood that a new mapping will be given the same R_Key that was issued a few milliseconds ago. To prevent this, don't reset the sequence count when unmapping a FMR. Signed-off-by: NOlaf Kirch <olaf.kirch@oracle.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Stefan Roscher 提交于
If a lot of QPs fall into Error state at once and the EQ of the respective HCA is too small, it might overrun, causing the eHCA driver to stop processing completion events and calling the application's completion handlers, effectively causing traffic to stop. Fix this by limiting available QPs and CQs to a customizable max count, and determining EQ size based on these counts and a worst-case assumption. Signed-off-by: NStefan Roscher <stefan.roscher@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Hoang-Nam Nguyen 提交于
ehca_create_eq() was assigning a signed return value to an unsiged local variable and then checking if the variable was < 0, which meant that errors were always ignored. Fix this by using one variable for signed integer return values and another for u64 hcall return values. Bug originally found by Roel Kluin <12o3l@tiscali.nl>. Signed-off-by: NHoang-Nam Nguyen <hnguyen@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Steve Wise 提交于
Open MPI, Intel MPI and other applications don't respect the iWARP requirement that the client (active) side of the connection send the first RDMA message. This class of application connection setup is called peer-to-peer. Typically once the connection is setup, _both_ sides want to send data. This patch enables supporting peer-to-peer over the chelsio RNIC by enforcing this iWARP requirement in the driver itself as part of RDMA connection setup. Connection setup is extended, when the peer2peer module option is 1, such that the MPA initiator will send a 0B Read (the RTR) just after connection setup. The MPA responder will suspend SQ processing until the RTR message is received and reply-to. In the longer term, this will be handled in a standardized way by enhancing the MPA negotiation so peers can indicate whether they want/need the RTR and what type of RTR (0B read, 0B write, or 0B send) should be sent. This will be done by standardizing a few bits of the private data in order to negotiate all this. However this patch enables peer-to-peer applications now and allows most of the required firmware and driver changes to be done and tested now. Design: - Add a module option, peer2peer, to enable this mode. - New firmware support for peer-to-peer mode: - a new bit in the rdma_init WR to tell it to do peer-2-peer and what form of RTR message to send or expect. - process _all_ preposted recvs before moving the connection into rdma mode. - passive side: defer completing the rdma_init WR until all pre-posted recvs are processed. Suspend SQ processing until the RTR is received. - active side: expect and process the 0B read WR on offload TX queue. Defer completing the rdma_init WR until all pre-posted recvs are processed. Suspend SQ processing until the 0B read WR is processed from the offload TX queue. - If peer2peer is set, driver posts 0B read request on offload TX queue just after posting the rdma_init WR to the offload TX queue. - Add CQ poll logic to ignore unsolicitied read responses. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Steve Wise 提交于
cxgb3 only supports 4GB memory regions. The lustre RDMA code uses this attribute and currently has to code around our bad setting. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Steve Wise 提交于
Open MPI and other stress testing exposed a few bad bugs in handling aborts in the middle of a normal close. Fix these by: - serializing abort reply and peer abort processing with disconnect processing - warning (and ignoring) if ep timer is stopped when it wasn't running - cleaning up disconnect path to correctly deal with aborting and dead endpoints - in iwch_modify_qp(), taking a ref on the ep before releasing the qp lock if iwch_ep_disconnect() will be called. The ref is dropped after calling disconnect. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Yevgeny Petrilin 提交于
Extend the mlx4_cq_resize() API with a way to set the "collapsed" flag for the CQ being created. Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 29 4月, 2008 1 次提交
-
-
由 Arthur Kepner 提交于
Add a new parameter, dmasync, to the ib_umem_get() prototype. Use dmasync = 1 when mapping user-allocated CQs with ib_umem_get(). Signed-off-by: NArthur Kepner <akepner@sgi.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Jes Sorensen <jes@sgi.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 4月, 2008 10 次提交
-
-
由 Roland Dreier 提交于
Remove the volatile qualifier from the cq_vbase member of struct nes_hw_cq, and add an rmb() in the one place where it looks like access order might make a difference. As usual, removing a volatile qualifier in a declaration is actually a bug fix, since a volatile qualifier is not sufficient to make sure that aggressively out-of-order CPUs don't reorder things and cause incorrect results. For example, a CPU might speculatively execute reads of other cqe fields before the NIC hardware has written those fields and before it has set the NES_CQE_VALID bit (even though those reads come after the test of the NES_CQE_VALID bit in program order), but then when the CPU actually executes the conditional test of the NES_CQE_VALID, the bit has been set, and the CPU will proceed with the results of the earlier speculative execution and end up using bogus data. This also gets rid of the warning: drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_destroy_cq': drivers/infiniband/hw/nes/nes_verbs.c:1978: warning: passing argument 3 of 'pci_free_consistent' discards qualifiers from pointer target type Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Yevgeny Petrilin 提交于
In addition to mlx4_ib, there will be ethernet and FC consumers of mlx4_core, so move the code for managing kernel doorbells into the core module to avoid having to duplicate this multiple times. Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Joachim Fenkes 提交于
Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Joachim Fenkes 提交于
Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Joachim Fenkes 提交于
Always enable large page support; didn't seem to cause problems for anyone. Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Joachim Fenkes 提交于
Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Joachim Fenkes 提交于
...as required by IB Spec, C10-29. Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Chien Tung 提交于
After PXE boot, the iw_nes driver does a full reset to ensure the card is in a clean state. However, it doesn't wait for firmware to complete its work before issuing a port reset to enable the ports, which leads to problems bringing up the ports. The solution is to wait for firmware to complete its work before proceeding with port reset. This bug was flagged by Roland Dreier <rolandd@cisco.com>. Cc: <stable@kernel.org> Signed-off-by: NChien Tung <ctung@neteffect.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
Use NIPQUAD_FMT instead of printing raw 32-bit hex quantities in debugging output. Acked-by: NGlenn Streiff <gstreiff@neteffect.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
Removing open-coded MAC formats shrinks the source and the generated code too, eg on x86-64: add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-103 (-103) function old new delta make_cm_node 932 912 -20 nes_netdev_set_mac_address 427 406 -21 nes_netdev_set_multicast_list 1148 1124 -24 nes_probe 2349 2311 -38 Acked-by: NGlenn Streiff <gstreiff@neteffect.com> Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 22 4月, 2008 4 次提交
-
-
由 Roland Dreier 提交于
Match what the PCI specification uses. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
The PCI MSI interface is stubbed out properly so that all the functions just return failure if PCI_MSI=n, so there's no reason to have "#ifdef CONFIG_PCI_MSI" blocks in ipath_iba7220.c. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
Before IBA7220 support was added, the ipath driver didn't support any hardware unless PCI_MSI and/or HT_IRQ was enabled. However, the IBA7220 can generate INTx interrupts, so it makes sense to allow the driver to be build even if PCI_MSI=n and HT_IRQ=n. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
由 Roland Dreier 提交于
The new IBA7220 code added a call to ipath_init_iba7220_funcs() that is compiled unconditionally, but only built the IBA7220 code if PCI_MSI is enabled. Fix this by building the IBA7220 file unconditonally. This fixes build breakage when PCI_MSI=n, HT_IRQ=y and INFINIBAND_IPATH=y reported by Ingo Molnar <mingo@elte.hu>: drivers/built-in.o: In function `ipath_init_one': ipath_driver.c:(.devinit.text+0x1e5bc): undefined reference to `ipath_init_iba7220_funcs' Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-