提交 · 9f81036c54ed1f860d2807c5a6aa4f2b30c21204 · openeuler / Kernel

22 5月, 2007 3 次提交

IB/cm: Improve local id allocation · 9f81036c

由 Michael S. Tsirkin 提交于 5月 21, 2007

The IB CM uses an idr for local id allocations, with a running counter
as start_id.  This fails to generate distinct ids if

1. An id is constantly created and destroyed
2. A chunk of ids just beyond the current next_id value is occupied

This in turn leads to an increased chance of connection request being
mis-detected as a duplicate, sometimes for several retries, until
next_id gets past the block of allocated ids. This has been observed
in practice.

As a fix, remember the last id allocated and start immediately above it.
This also fixes a problem with the old code, where next_id might
overflow and become negative.
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Acked-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

9f81036c

IPoIB/cm: Fix SRQ WR leak · 518b1646

由 Michael S. Tsirkin 提交于 5月 21, 2007

SRQ WR leakage has been observed with IPoIB/CM: e.g. flipping ports on
and off will, with time, leak out all WRs and then all connections
will start getting RNR NAKs.  Fix this in the way suggested by spec:
move the QP being destroyed to the error state, wait for "Last WQE
Reached" event and then post WR on a "drain QP" connected to the same
CQ.  Once we observe a completion on the drain QP, it's safe to call
ib_destroy_qp.
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

518b1646

IB/ipoib: Fix typos in error messages · 24bd1e4e

由 Michael S. Tsirkin 提交于 5月 18, 2007

Trivial error message fixups.
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

24bd1e4e

21 5月, 2007 2 次提交

IB/mlx4: Check if SRQ is full when posting receive · 56a8c8b6

由 Roland Dreier 提交于 5月 20, 2007

Make mlx4_post_srq_recv() fail if the SRQ is full (head == tail).
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

56a8c8b6

IB/mlx4: Pass send queue sizes from userspace to kernel · 2446304d

由 Eli Cohen 提交于 5月 17, 2007

Pass the number of WQEs for the send queue and their size from userspace
to the kernel to avoid having to keep the QP size calculations in sync
between the kernel driver and libmlx4. This fixes a bug seen with the
current mlx4_ib driver and current libmlx4 caused by a difference in the
calculated sizes for SQ WQEs. Also, this gives more flexibility for
userspace to experiment with using multiple WQE BBs for a single SQ WQE.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

2446304d

19 5月, 2007 13 次提交

IB/mlx4: Fix check of opcode in mlx4_ib_post_send() · 59b0ed12

由 Roland Dreier 提交于 5月 19, 2007

wr->opcode is invalid if it's >= ARRAY_SIZE(mlx4_ib_opcode), not just
strictly >.

This was spotted by the Coverity checker (CID 1643).
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

59b0ed12

IB/mlx4: Fix RESET to RESET and RESET to ERROR transitions · 65adfa91

由 Michael S. Tsirkin 提交于 5月 14, 2007

According to the IB spec, a QP can be moved from RESET back to RESET
or to the ERROR state, but mlx4 firmware does not support this and
returns an error if we try.  Fix the RESET to RESET transition by
just returning 0 without doing anything, and fix RESET to ERROR by
moving the QP from RESET to INIT with dummy parameters and then
transitioning from INIT to ERROR.
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

65adfa91

IB/mthca: Fix RESET to ERROR transition · b18aad71

由 Michael S. Tsirkin 提交于 5月 14, 2007

According to the IB spec, a QP can be moved from RESET to the ERROR
state, but mthca firmware does not support this and returns an error if
we try. Work around this FW limitation by moving the QP from RESET to
INIT with dummy parameters and then transitioning from INIT to ERROR.
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

b18aad71

IB/mlx4: Set GRH:HopLimit when sending globally routed MADs · 15261303

由 Roland Dreier 提交于 5月 19, 2007

This is the same issue discovered in mthca by Rolf Manderscheid
<rvm@obsidianresearch.com>.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

15261303

IB/mthca: Set GRH:HopLimit when building MLX headers · 3f37cae6

由 Rolf Manderscheid 提交于 5月 17, 2007

Global CM packets used by rmda_cm were being sent with a GRH:hopLimit
of zero, causing them to be dropped by the router. The problem is a
missing initialization of the hop_limit field in mthca_read_ah(),
which was called by build_mlx_header() when sending a MAD on QP1.
Signed-off-by: NRolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

3f37cae6

IB/mlx4: Fix check of max_qp_dest_rdma in modify QP · 1f8f7b7a

由 Eli Cohen 提交于 5月 17, 2007

max_qp_dest_rdma is already in natural units - no need to shift.  This
was discovered by a test that deliberately requests more outstanding
atomic operation than the device supports.

Found by Sagi Rotem at Mellanox.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

1f8f7b7a

IB/mthca: Fix use-after-free on device restart · de57c9f1

由 Ali Ayoub 提交于 5月 17, 2007

Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

de57c9f1

IB/ehca: Return proper error code if register_mr fails · bd5a6ccc

由 Hoang-Nam Nguyen 提交于 5月 16, 2007

Set the return code of ehca_register_mr() to ENOMEM if the corresponding
firmware call fails due to out of resources. Some other error codes
were explicitly mapped to EINVAL -- just remove those cases so they
get mapped to the default case, which already returns EINVAL anyway.
Signed-off-by: NHoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

bd5a6ccc

IPoIB: Handle P_Key table reordering · 26bbf13c

由 Yosef Etigin 提交于 5月 19, 2007

SM reconfiguration or failover possibly causes a shuffling of the values
in the P_Key table. Right now, IPoIB only queries for the P_Key index
once when it creates the device QP, and hence there are problems if the
index of a P_Key value changes.  Fix this by using the PKEY_CHANGE event
to trigger a recheck of the P_Key index.
Signed-off-by: NYosef Etigin <yosefe@voltaire.com>
Acked-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

26bbf13c

IB/core: Use start_port() and end_port() · 1af4c435

由 Roland Dreier 提交于 5月 19, 2007

Clean up ib_query_port() and ib_modify_port() slightly by using the 
just-added start_port() and end_port() helpers.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

1af4c435

IB/core: Add helpers for uncached GID and P_Key searches · 5eb620c8

由 Yosef Etigin 提交于 5月 14, 2007

Add ib_find_gid() and ib_find_pkey() functions that use uncached device
queries. The calls might block but the returns are always up-to-date.
Cache P_Key and GID table lengths in core to avoid extra port info queries.
Signed-off-by: NYosef Etigin <yosefe@voltaire.com>
Acked-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

5eb620c8

IB/ipath: Fix potential deadlock with multicast spinlocks · 8b8c8bca

由 Roland Dreier 提交于 5月 19, 2007

Lockdep found the following potential deadlock between mcast_lock and
n_mcast_grps_lock: mcast_lock is taken from both interrupt context and
process context, so spin_lock_irqsave() must be used to take it.
n_mcast_grps_lock is only taken from process context, so at first it
seems safe to take it with plain spin_lock(); however, it also nests
inside mcast_lock, and hence we could deadlock:

  cpu A                                   cpu B
    ipath_mcast_add():
      spin_lock_irq(&mcast_lock);

                                            ipath_mcast_detach():
                                              spin_lock(&n_mcast_grps_lock);

                                            <enter interrupt>

                                            ipath_mcast_find():
                                              spin_lock_irqsave(&mcast_lock);

      spin_lock(&n_mcast_grps_lock);

Fix this by using spin_lock_irq() to take n_mcast_grps_lock.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

8b8c8bca

IB/core: Free umem when mm is already gone · 7b82cd8e

由 Eli Cohen 提交于 5月 14, 2007

Free umem when task's mm is already destroyed by the time
ib_umem_release gets called.

Found by Dotan Barak at Mellanox.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

7b82cd8e

15 5月, 2007 14 次提交

IPoIB/cm: Optimize stale connection detection · 7c5b9ef8

由 Michael S. Tsirkin 提交于 5月 14, 2007

In the presence of some running RX connections, we repeat
queue_delayed_work calls each 4 RX WRs, which is a waste.  It's enough
to start stale task when a first passive connection is added, and
rerun it every IPOIB_CM_RX_DELAY as long as there are outstanding
passive connections.

This removes some code from RX data path.
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

7c5b9ef8

IB/mthca: Set cleaned CQEs back to HW ownership when cleaning CQ · bd18c112

由 Michael S. Tsirkin 提交于 5月 14, 2007

mthca_cq_clean() updates the CQ consumer index without moving CQEs
back to HW ownership.  As a result, the same WRID might get reported
twice, resulting in a use-after-free.  This was observed in IPoIB CM.
Fix by moving all freed CQEs to HW ownership.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=617>
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

bd18c112

IB/mthca: Fix posting >255 recv WRs for Tavor · 3e28c56b

由 Michael S. Tsirkin 提交于 5月 14, 2007

Fix posting lists of > 255 receive WRs for Tavor: rq.next_ind must
be updated each doorbell, otherwise the next doorbell will use an
incorrect index.

Found by Ronni Zimmermann at Mellanox.
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

3e28c56b

RDMA/cma: Add check to validate that cm_id is bound to a device · 6c719f5c

由 Sean Hefty 提交于 5月 07, 2007

Several checks in the rdma_cm check against the state of the
cm_id, but only to validate that the cm_id is bound to an underlying
transport specific CM and an RDMA device.  Make the check explicit
in what we're trying to check for, since we're not synchronizing
against the cm_id state.

This will allow a user to disconnect a cm_id or reject a connection
after receiving a device removal event.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

6c719f5c

RDMA/cma: Fix synchronization with device removal in cma_iw_handler · be65f086

由 Sean Hefty 提交于 5月 07, 2007

The cma_iw_handler needs to validate the state of the rdma_cm_id before
processing a new connection request to ensure that a device removal is
not already being processed for the same rdma_cm_id. Without the state
check, the user can receive simultaneous callbacks for the same cm_id, or
a callback after they've destroyed the cm_id.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

be65f086

RDMA/cma: Simplify device removal handling code · 8aa08602

由 Sean Hefty 提交于 5月 07, 2007

Add a new routine and rename another to encapsulate common code for
synchronizing with device removal.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

8aa08602

IB/ehca: Disable scaling code by default, bump version number · 4e430dcb

由 Joachim Fenkes 提交于 5月 09, 2007

- Scaling code is still considered experimental, so disable it by default
- Increase version to SVNEHCA_0023
Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

4e430dcb

IB/ehca: Beautify sysfs attribute code and fix compiler warnings · bba9b601

由 Joachim Fenkes 提交于 5月 09, 2007

eHCA's sysfs attributes are now being created via sysfs_create_group(),
making the process neatly table-driven. The return value is checked, thus
fixing a few compiler warnings.
Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

bba9b601

IB/ehca: Remove _irqsave, move #ifdef · c7a14939

由 Joachim Fenkes 提交于 5月 09, 2007

- In ehca_process_eq(), we're IRQ safe throughout the whole function, so we
  don't need another _irqsave in the middle of flight.

- take_over_work() is only called by comp_pool_callback(), so it can move
  into the same #ifdef block.
Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

c7a14939

IB/ehca: Fix AQP0/1 QP number · c55a0ddd

由 Hoang-Nam Nguyen 提交于 5月 09, 2007

AQP0/1 should report qp_num={0|1} and the actual QP# should be stored
in struct ehca_qp, not the other way round.
Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

c55a0ddd

IB/ehca: Correctly set GRH mask bit in ehca_modify_qp() · 92761cda

由 Joachim Fenkes 提交于 5月 09, 2007

The driver needs to always supply the "GRH present" flag to the
hypervisor, whether it's true or false. Not supplying it (i.e. not
setting the corresponding mask bit) amounts to a "perhaps", which we
don't want.
Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

92761cda

IB/ehca: Serialize hypervisor calls in ehca_register_mr() · 5d88278e

由 Stefan Roscher 提交于 5月 09, 2007

Some pSeries hypervisor versions show a race condition in the allocate
MR hCall.  Serialize this call per adapter to circumvent this problem.
Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

5d88278e

IB/ipath: Shadow the gpio_mask register · 8f140b40

由 Arthur Jones 提交于 5月 10, 2007

Once upon a time, GPIO interrupts were rare. But then a chip bug in
the waldo series forced the use of a GPIO interrupt to signal packet
reception. This greatly increased the frequency of GPIO interrupts
which have the gpio_mask bits set on the waldo chips. Other bits in
the gpio_status register are used for I2C clock and data lines, these
bits are usually on. An "unlikely" annotation leftover from the old
days was improperly applied to these bits, and an unnecessary chip
mmio read was being accessed in the interrupt fast path on waldo.

Remove the stagnant unlikely annotation in the interrupt handler and
keep a shadow copy of the gpio_mask register to avoid the slow mmio
read when testing for interruptable GPIO bits.
Signed-off-by: NArthur Jones <arthur.jones@qlogic.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

8f140b40

IB/mlx4: Fix uninitialized spinlock for 32-bit archs · 26c6bc7b

由 Jack Morgenstein 提交于 5月 13, 2007

uar_lock spinlock was used in mlx4_ib_cq_arm without being initialized
(this only affects 32-bit archs, because uar_lock is not used on
64-bit archs and MLX4_INIT_DOORBELL_LOCK() is a NOP).
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

26c6bc7b

10 5月, 2007 2 次提交

[S390] Kconfig: menus with depends on HAS_IOMEM. · e25df120

由 Martin Schwidefsky 提交于 5月 10, 2007

Add "depends on HAS_IOMEM" to a number of menus to make them
disappear for s390 which does not have I/O memory.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

e25df120

由 Rafael J. Wysocki 提交于 5月 09, 2007

Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress.  This
patch introduces such notifications and causes them to be used during
suspend and resume transitions.  It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).

[oleg@tv-sign.ru: cleanups]
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8bb78442

09 5月, 2007 5 次提交

IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters · 225c7b1f

由 Roland Dreier 提交于 5月 08, 2007

Add an InfiniBand driver for Mellanox ConnectX adapters.  Because
these adapters can also be used as ethernet NICs and Fibre Channel 
HBAs, the driver is split into two modules: 
 
  mlx4_core: Handles low-level things like device initialization and 
    processing firmware commands.  Also controls resource allocation 
    so that the InfiniBand, ethernet and FC functions can share a 
    device without stepping on each other. 
 
  mlx4_ib: Handles InfiniBand-specific things; plugs into the 
    InfiniBand midlayer. 
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

225c7b1f

IB: Put rlimit accounting struct in struct ib_umem · 1bf66a30

由 Roland Dreier 提交于 4月 18, 2007

When memory pinned with ib_umem_get() is released, ib_umem_release()
needs to subtract the amount of memory being unpinned from
mm->locked_vm.  However, ib_umem_release() may be called with
mm->mmap_sem already held for writing if the memory is being released
as part of an munmap() call, so it is sometimes necessary to defer
this accounting into a workqueue.

However, the work struct used to defer this accounting is dynamically
allocated before it is queued, so there is the possibility of failing
that allocation.  If the allocation fails, then ib_umem_release has no
choice except to bail out and leave the process with a permanently
elevated locked_vm.

Fix this by allocating the structure to defer accounting as part of
the original struct ib_umem, so there's no possibility of failing a
later allocation if creating the struct ib_umem and pinning memory
succeeds.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

1bf66a30

IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules · f7c6a7b5

由 Roland Dreier 提交于 3月 04, 2007

Export ib_umem_get()/ib_umem_release() and put low-level drivers in
control of when to call ib_umem_get() to pin and DMA map userspace,
rather than always calling it in ib_uverbs_reg_mr() before calling the
low-level driver's reg_user_mr method.

Also move these functions to be in the ib_core module instead of
ib_uverbs, so that driver modules using them do not depend on
ib_uverbs.

This has a number of advantages:
 - It is better design from the standpoint of making generic code a
   library that can be used or overridden by device-specific code as
   the details of specific devices dictate.
 - Drivers that do not need to pin userspace memory regions do not
   need to take the performance hit of calling ib_mem_get().  For
   example, although I have not tried to implement it in this patch,
   the ipath driver should be able to avoid pinning memory and just
   use copy_{to,from}_user() to access userspace memory regions.
 - Buffers that need special mapping treatment can be identified by
   the low-level driver.  For example, it may be possible to solve
   some Altix-specific memory ordering issues with mthca CQs in
   userspace by mapping CQ buffers with extra flags.
 - Drivers that need to pin and DMA map userspace memory for things
   other than memory regions can use ib_umem_get() directly, instead
   of hacks using extra parameters to their reg_phys_mr method.  For
   example, the mlx4 driver that is pending being merged needs to pin
   and DMA map QP and CQ buffers, but it does not need to create a
   memory key for these buffers.  So the cleanest solution is for mlx4
   to call ib_umem_get() in the create_qp and create_cq methods.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

f7c6a7b5

inode numbering: change libfs sb creation routines to avoid collisions with their root inodes · 1a1c9bb4

由 Jeff Layton 提交于 5月 08, 2007

This patch makes it so that simple_fill_super and get_sb_pseudo assign their
root inodes to be number 1. It also fixes up a couple of callers of
simple_fill_super that were passing in files arrays that had an index at
number 1, and adds a warning for any caller that sends in such an array.

It would have been nice to have made it so that it wasn't possible to make
such a collision, but some callers need to be able to control what inode
number their entries get, so I think this is the best that can be done.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1a1c9bb4

header cleaning: don't include smp_lock.h when not used · e63340ae

由 Randy Dunlap 提交于 5月 08, 2007

Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e63340ae

07 5月, 2007 1 次提交

IPoIB: Convert to NAPI · 8d1cc86a

由 Roland Dreier 提交于 5月 06, 2007

Convert the IP-over-InfiniBand network device driver over to using
NAPI to handle completions for the main CQ.  This covers all receives
as well as datagram mode sends; send completions for connected mode
connections are still handled from interrupt context.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

8d1cc86a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功