提交 · f54c77dd9953241be8b63f9239facdde82b3eb18 · openeuler / Kernel

23 10月, 2012 1 次提交

RDMA/cxgb4: Don't free chunk that we have failed to allocate · 32c631f9

由 Thadeu Lima de Souza Cascardo 提交于 10月 12, 2012

In the error path of registering memory when there's a failure to
allocate a chunk from the memory pool, we try to free the same chunk
we just failed to allocate, which will BUG().
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

32c631f9

19 10月, 2012 3 次提交

IB/mlx4: Synchronize cleanup of MCGs in MCG paravirtualization · bef83ed9

由 Eli Cohen 提交于 10月 17, 2012

A client re-register event invokes cleanup of all MCGs. This is
required to protect against misbehaved guests leading to corruption of
join/leave database. However, since cleaning up the MCGs is a heavy
operation, it is pushed to a work queue for further processing.
Client re-register is also propagated to ULPs (e.g IPoIB).

However, since the cleanup is performed in a workqueue, the ULP could
leave and re-join groups before the cleanup occurs. In this case,
when the cleanup takes place, it prunes the (newly-joined) MCGs and
the ULP is left without actual MCGs while believing it joined them.

Fix this by setting the flushing flag before invoking the cleanup task
and clearing it after flushing is complete.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Reviewed-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

bef83ed9

IB/mlx4: Fix QP1 P_Key processing in the Primary Physical Function (PPF) · 2c75d2cc

由 Jack Morgenstein 提交于 10月 17, 2012

In the MAD paravirtualization code, one of the checks performed when
forwarding QP1 (GSI) packets from wire to slave was a P_Key check: the
P_Key received in the MAD must be present in the guest's paravirtualized
P_Key table, and at least one of the (packet P_Key, guest P_Key) must
be a full-membership P_Key.

However, if everyone involved has only limited membership in the
default P_Key, then packets sent by full-member remote hosts arrive at
the PPF but are not passed on to the VFs with the current P_Key1 check.

Fix this as follows:

1. Don't care if P_Key received over wire is full or not. If it
   successfully passed HW checks on the real QP1, then simply pass it
   to guest regardless of whether the guest has full or limited
   membership in its P_Key table.

2. If the guest (including paravirtualized master) has both full and
   limited P_Key forms in its table, preferentially pass the
   paravirtualized P_Key index of the full P_Key form in the tunnel
   header.

3. In the multicast join flow (mlx4/mcg.c), use the index for the
   default P_Key (wherever it is located) in replies generated from
   within the mcg module (previously, P_Key index 0 was used in all
   cases).
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

2c75d2cc

IB/mlx4: Fix build error on platforms where UL is not 64 bits · 8a095030

由 Doug Ledford 提交于 10月 17, 2012

Line 110 uses UL as a compiler cast for the 0x constant, but it's not
large enough to hold a 64-bit value on a 32-bit arch.
Signed-off-by: NDoug Ledford <dledford@redhat.com>

[ Use "-1" instead of "FFFFFFFFFFFFFFFFULL".  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

8a095030

09 10月, 2012 1 次提交

mm: kill vma flag VM_RESERVED and mm->reserved_vm counter · 314e51b9

由 Konstantin Khlebnikov 提交于 10月 08, 2012

A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:

 | effect                 | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump      | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP

This patch removes reserved_vm counter from mm_struct.  Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.

Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.

remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.

[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

314e51b9

07 10月, 2012 1 次提交

infiniband: pass rdma_cm module to netlink_dump_start · 809d5fc9

由 Gao feng 提交于 10月 04, 2012

set netlink_dump_control.module to avoid panic.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

809d5fc9

06 10月, 2012 1 次提交

idr: rename MAX_LEVEL to MAX_IDR_LEVEL · 125c4c70

由 Fengguang Wu 提交于 10月 04, 2012

To avoid name conflicts:

  drivers/video/riva/fbdev.c:281:9: sparse: preprocessor token MAX_LEVEL redefined

While at it, also make the other names more consistent and add
parentheses.

[akpm@linux-foundation.org: repair fallout]
[sfr@canb.auug.org.au: IB/mlx4: fix for MAX_ID_MASK to MAX_IDR_MASK name change]
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: walter harms <wharms@bfs.de>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

125c4c70

05 10月, 2012 1 次提交

RDMA/cma: Check that retry count values are in range · 4ede178a

由 Sean Hefty 提交于 10月 03, 2012

The retry_count and rnr_retry_count connection parameters are both
3-bit values.  Check that the values are in range and reduce if
they're not.

This fixes a problem reported by Doug Ledford <dledford@redhat.com>
that resulted in the userspace rping test (part of the librdmacm
samples) failing to run over Intel IB HCAs.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>

[ Use min_t() to avoid warnings about type mismatch.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

4ede178a

04 10月, 2012 5 次提交

IB/iser: Add more RX CQs to scale out processing of SCSI responses · 5a33a669

由 Alex Tabachnik 提交于 9月 23, 2012

RX/TX CQs will now be selected from a per HCA pool.  For the RX flow
this has the effect of using different interrupt vectors when using
low level drivers (such as mlx4) that map the "vector" param provided
by the ULP on CQ creation to a dedicated IRQ/MSI-X vector.  This
allows the RX flow processing of IO responses to be distributed across
multiple CPUs.

QPs (--> iSER sessions) are assigned to CQs in round robin order using
the CQ with the minimum number of sessions attached to it.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NAlex Tabachnik <alext@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

5a33a669

RDMA/nes: Bump the version number of nes driver · cf9fd75c

由 Tatyana Nikolova 提交于 10月 02, 2012

Bumpthe version of the nes driver to reflect recent fixes.
Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

cf9fd75c

RDMA/nes: Remove unused module parameter "send_first" · a378e3a3

由 Tatyana Nikolova 提交于 10月 02, 2012

Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

a378e3a3

RDMA/nes: Remove unnecessary if-else statement · a67b078c

由 Tatyana Nikolova 提交于 10月 02, 2012

Remove unnecessary if-else statement -- we do the same thing either way.
Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

a67b078c

RDMA/nes: Add missing break to switch. · d5fb476a

由 Tatyana Nikolova 提交于 10月 02, 2012

Static code analyzer cppcheck points out a missing break.
Reported-by: NDavid Binderman <dcb314@hotmail.com>
Addresses: <https://bugzilla.kernel.org/show_bug.cgi?id=47671>
Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

d5fb476a

03 10月, 2012 1 次提交

IPoIB: Fix build with CONFIG_INFINIBAND_IPOIB_CM=n · 71d9c5f9

由 Roland Dreier 提交于 10月 02, 2012

With the new netlink support in commit 862096a8 ("IB/ipoib: Add more
rtnl_link_ops callbacks") we need ipoib_set_mode() to be available even
if connected mode isn't built.  Move the function from ipoib_cm.c to
ipoib_main.c (and make a few CM-related macros available unconditonally).

This fixes the build error

    drivers/built-in.o: In function 'ipoib_changelink':
    ipoib_netlink.c:(.text+0x6a5fc9): undefined reference to 'ipoib_set_mode'
    ipoib_netlink.c:(.text+0x6a5fe3): undefined reference to 'ipoib_set_mode'

when CONFIG_INFINIBAND_IPOIB_CM isn't set.
Reported-by: NRandy Dunlap <rdunlap@xenotime.net>
Reported-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

71d9c5f9

02 10月, 2012 2 次提交

IB/ipoib: Add more rtnl_link_ops callbacks · 862096a8

由 Or Gerlitz 提交于 9月 27, 2012

Add the rtnl_link_ops changelink and fill_info callbacks, through
which the admin can now set/get the driver mode, etc policies.
Maintain the proprietary sysfs entries only for legacy childs.

For child devices, set dev->iflink to point to the parent
device ifindex, such that user space tools can now correctly
show the uplink relation as done for vlan, macvlan, etc
devices. Pointed out by Patrick McHardy <kaber@trash.net>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

862096a8

IB/qib: Fix local access validation for user MRs · c00aaa1a

由 Mike Marciniszyn 提交于 9月 28, 2012

Commit 8aac4cc3 ("IB/qib: RCU locking for MR validation") introduced
a bug that broke user post sends.  The proper validation of the MR
was lost in the patch.

This patch corrects that validation.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c00aaa1a

01 10月, 2012 24 次提交

IB/srp: Avoid having aborted requests hang · d8536670

由 Bart Van Assche 提交于 8月 24, 2012

We need to call scsi_done() for commands after we abort them.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Acked-by: NDavid Dillow <dillowda@ornl.gov>
Cc: <stable@vger.kernel.org>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

d8536670

IB/srp: Fix use-after-free in srp_reset_req() · 9b796d06

由 Bart Van Assche 提交于 8月 24, 2012

srp_free_req() uses the scsi_cmnd structure contents to unmap
buffers, so we must invoke srp_free_req() before we release
ownership of that structure.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Acked-by: NDavid Dillow <dillowda@ornl.gov>
Cc: <stable@vger.kernel.org>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

9b796d06

IB/qib: Add a qib driver version · e20d5838

由 Dean Luick 提交于 9月 13, 2012

Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

e20d5838

RDMA/nes: Fix compilation error when nes_debug is enabled · bca1935c

由 Tatyana Nikolova 提交于 9月 20, 2012

Removing old variables caused a compile error from nes_debug().
Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

bca1935c

RDMA/nes: Print hardware resource type · 81821644

由 Tatyana Nikolova 提交于 9月 20, 2012

Hardware resource types are added and when a resource isn't available,
its type is printed.
Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

81821644

RDMA/nes: Fix for crash when TX checksum offload is off · fc4ba729

由 Tatyana Nikolova 提交于 9月 20, 2012

When TX checksum offload is disabled for an iWarp connection,
skb->ip_summed needs to be set to CHECKSUM_NONE.
Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

fc4ba729

RDMA/nes: Cosmetic changes · 48a99563

由 Tatyana Nikolova 提交于 9月 20, 2012

 - Remove unnecessary statement "if (1)"
 - Refactor a statement (wqe_misc |= NES_NIC_SQ_WQE_COMPLETION) out of
   if/else statement, because it is independant of the flow.
 - Define netdev->features in one line for clarity.
Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

48a99563

RDMA/nes: Fix for incorrect MSS when TSO is on · 6ad1be81

由 Tatyana Nikolova 提交于 9月 20, 2012

In TSO handling code, skb_shared_info() is used to get the MSS
instead of the bool function skb_is_gso() (which always returns 1).
Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

6ad1be81

RDMA/nes: Fix incorrect resolving of the loopback MAC address · ef3d0c4a

由 Tatyana Nikolova 提交于 9月 20, 2012

Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ef3d0c4a

IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes · 3806d08c

由 Jack Morgenstein 提交于 8月 03, 2012

When we have VFs and PFs on same host, the VFs are activated within
the mlx4_core module before the mlx4_ib kernel module is loaded.

When the mlx4_ib module initializes the PF (master), it now creates
MAD paravirtualization contexts for any VFs that already active.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

3806d08c

mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations · 47605df9

由 Jack Morgenstein 提交于 8月 03, 2012

Previously, the structure of a guest's proxy QPs followed the
structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
qp1 port 2, ...). The guest then did offset calculations on the
sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().

This is now changed so that the guest does no offset calculations
regarding proxy or tunnel QPs to use. This change frees the PPF from
needing to adhere to a specific order in allocating proxy and tunnel
QPs.

Now QUERY_FUNC_CAP provides each port individually with its proxy
qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
used directly where required (with no offset calculations).

To accomplish this change, several fields were added to the phys_caps
structure for use by the PPF and by non-SR-IOV mode:

base_sqpn -- in non-sriov mode, this was formerly sqp_start.
base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.

The current code in the PPF still adheres to the previous layout of
sqps, proxy-sqps and tunnel-sqps. However, the PPF can change this
layout without affecting VF or (paravirtualized) PF code.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

47605df9

mlx4: Paravirtualize Node Guids for slaves · afa8fd1d

由 Jack Morgenstein 提交于 8月 03, 2012

This is necessary in order to support > 1 VF/PF in a VM for software
that uses the node guid as a discriminator, such as librdmacm.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

afa8fd1d

mlx4: Activate SR-IOV mode for IB · 026149cb

由 Jack Morgenstein 提交于 8月 03, 2012

Remove the error returns for IB ports from mlx4_ib_add,
mlx4_INIT_PORT_wrapper, and mlx4_CLOSE_PORT_wrapper.

Currently, SRIOV is supported only for devices for which the
link layer is IB on all ports; RoCE support will be added later.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

026149cb

IB/mlx4: Miscellaneous adjustments for SR-IOV IB support · 992e8e6e

由 Jack Morgenstein 提交于 8月 03, 2012

1. Allow only master to change node description.
2. Prevent AH leakage in send mads.
3. Take device part number from PCI structure, so that guests see the
   VF part number (and not the PF part number).
4. Place the device revision ID into caps structure at startup.
5. SET_PORT in update_gids_task needs to go through wrapper on master.
6. In mlx4_ib_event(), PORT_MGMT_EVENT needs be handled in a work
   queue on the master, since it propagates events to slaves using
   GEN_EQE.
7. Do not support FMR on slaves.
8. Add spinlock to slave_event(), since it is called both in interrupt
   context and in process context (due to 6 above, and also if
   smp_snoop is used).  This fix was found and implemented by Saeed
   Mahameed <saeedm@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

992e8e6e

IB/mlx4: Add iov directory in sysfs under the ib device · c1e7e466

由 Jack Morgenstein 提交于 8月 03, 2012

This directory is added only for the master -- slaves do not have it.

The sysfs iov directory is used to manage and examine the port P_Key
and guid paravirtualization.

Under iov/ports, the administrator may examine the gid and P_Key tables
as they are present in the device (and as are seen in the "network
view" presented to the SM).

Under the iov/<pci slot number> directories, the admin may map the
index numbers in the physical tables (as under iov/ports) to the
paravirtualized index numbers that guests see.

For example, if the administrator, for port 1 on guest 2 maps physical
pkey index 10 to virtual index 1, then that guest, whenever it uses
its pkey index 1, will actually be using the real pkey index 10.

Based on patch from Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c1e7e466

IB/mlx4: Propagate P_Key and guid change port management events to slaves · 2a4fae14

由 Jack Morgenstein 提交于 8月 03, 2012

P_Key change and guid change events are not of interest to all slaves,
but only to those slaves which "see" the table slots whose contents
have change.

For example, if the guid at port 1, index 5 has changed in the PPF, we
wish to propagate the gid-change event only to the function which has
that guid index mapped to its port/guid table (in this case it is
slave #5). Other functions should not get the event, since the event
does not affect them.

Similarly with P_Keys -- P_Key change events are forwarded only to
slaves which have that P_Key index mapped to their virtual P_Key table.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

2a4fae14

mlx4: Add alias_guid mechanism · a0c64a17

由 Jack Morgenstein 提交于 8月 03, 2012

For IB ports, we paravirtualize the GUID at index 0 on slaves.  The
GUID at index 0 seen by a slave is the actual GUID occupying the GUID
table at the slave-id index.

The driver, by default, requests at startup time that subnet manager
populate its entire guid table with GUIDs. These guids are then mapped
(paravirtualized) to the slaves, and appear for each slave as its GUID
at index 0.

Until each slave has such a guid, its port status is DOWN.

The guid table is cached to support special QP paravirtualization, and
event propagation to slaves on guid change (we test to see if the guid
really changed before propagating an event to the slave).

To support this caching, add capability to __mlx4_ib_query_gid() to
obtain the network view (i.e., physical view) gid at index X, not just
the host (paravirtualized) view.

Based on a patch from Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

a0c64a17

IB/mlx4: Add CM paravirtualization · 3cf69cc8

由 Amir Vadai 提交于 8月 03, 2012

In CM para-virtualization:

1. Incoming requests are steered to the correct vHCA according to the
   embedded GID.
2. Communication IDs on outgoing requests are replaced by a globally
   unique ID, generated by the PPF, since there is no synchronization
   of ID generation between guests (and so these IDs are not
   guaranteed to be globally unique).  The guest's comm ID is stored,
   and is returned to the response MAD when it arrives.
Signed-off-by: NAmir Vadai <amirv@mellanox.co.il>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

3cf69cc8

IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV · b9c5d6a6

由 Oren Duer 提交于 8月 03, 2012

MCG paravirtualization support includes:
- Creating multicast groups by VFs, and keeping accounting of them
- Leaving multicast groups by VFs
- Updating SM only with real changes in the overall picture of MCGs status
- Creation of MGID=0 groups (let SM choose MGID)

Note that the MCG module maintains its own internal MCG object
reference counts.  The reason for this is that the IB core is used to
track only the multicast groups joins generated by the PF it runs
over.  The PF IB core layer is unaware of slaves, so it cannot be used
to keep track of MCG joins they generate.
Signed-off-by: NOren Duer <oren@mellanox.co.il>
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b9c5d6a6

mlx4: MAD_IFC paravirtualization · 0a9a0188

由 Jack Morgenstein 提交于 8月 03, 2012

The MAD_IFC firmware command fulfills two functions.

First, it is used in the QP0/QP1 MAD-handling flow to obtain
information from the FW (for answering queries), and for setting
variables in the HCA (MAD SET packets).

For this, MAD_IFC should provide the FW (physical) view of the data.
This is the view that OpenSM needs.  We call this the "network view".

In the second case, MAD_IFC is used by various verbs to obtain data
regarding the local HCA (e.g., ib_query_device()).  We call this the
"host view".

This data needs to be paravirtualized.

MAD_IFC therefore needs a wrapper function, and also needs another
flag indicating whether it should provide the network view (when it is
called by ib_process_mad in special-qp packet handling), or the host
view (when it is called while implementing a verb).

There are currently 2 flag parameters in mlx4_MAD_IFC already:
ignore_bkey and ignore_mkey.  These two parameters are replaced by a
single "mad_ifc_flags" parameter, with different bits set for each
flag.  A third flag is added: "network-view/host-view".
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

0a9a0188

IB/mlx4: SR-IOV multiplex and demultiplex MADs · 37bfc7c1

由 Jack Morgenstein 提交于 8月 03, 2012

Special QPs are paravirtualized.

vHCAs are not given direct access to QP0/1. Rather, these QPs are
operated by a special context hosted by the PF, which mediates access
to/from vHCAs.  This is done by opening a "tunnel" per vHCA port per
QP0/1. A tunnel comprises a pair of UD QPs: a "Tunnel QP" in the
PF-context and a "Proxy QP" in the vHCA.  All vHCA MAD traffic must
pass through the corresponding tunnel.  vHCA QPs cannot be assigned to
VL15 and are denied of the well-known QKey.

Outgoing messages are "de-multiplexed" (i.e., directed to the wire via
the real special QP).

Incoming messages are "multiplexed" (i.e. steered by the PPF to the
correct VF or to the PF)

QP0 access is restricted to the PF vHCA. VF vHCAs also have (virtual)
QP0s, but they never receive any SMPs and all SMPs sent are discarded.
QP1 traffic is allowed for all vHCAs, but special care is required to
bridge the gap between the host and network views.

Specifically:
- Transaction IDs are mapped to guarantee uniqueness among vHCAs
- CM para-virtualization
  o   Incoming requests are steered to the correct vHCA according to the embedded GID
  o   Local communication IDs are mapped to ensure uniqueness among vHCAs
  (see the patch that adds CM paravirtualization.)
- Multicast para-virtualization
  o   The PF context aggregates membership state from all vHCAs
  o   The SA is contacted only when the aggregate membership changes
  o   If the aggregate does not change, the PF context will provide the
      requesting vHCA with the proper response.
  (see the patch that adds multicast group paravirtualization)

Incoming MADs are steered according to:
- the DGID If a GRH is present
- the mapped transaction ID for response MADs
- the embedded GID in CM requests
- the remote communication ID in other CM messages
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

37bfc7c1

mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop · 54679e14

由 Jack Morgenstein 提交于 8月 03, 2012

This requires:

1. Replacing the paravirtualized P_Key index (inserted by the guest)
   with the real P_Key index.

2. For UD QPs, placing the guest's true source GID index in the
   address path structure mgid field, and setting the ud_force_mgid
   bit so that the mgid is taken from the QP context and not from the
   WQE when posting sends.

3. For UC and RC QPs, placing the guest's true source GID index in the
   address path structure mgid field.

4. For tunnel and proxy QPs, setting the Q_Key value reserved for that
   proxy/tunnel pair.

Since not all the above adjustments occur in all the QP transitions,
the QP transitions require separate wrapper functions.

Secondly, initialize the P_Key virtualization table to its default
values: Master virtualized table is 1-1 with the real P_Key table,
guest virtualized table has P_Key index 0 mapped to the real P_Key
index 0, and all the other P_Key indices mapped to the reserved
(invalid) P_Key at index 127.

Finally, add logic in smp_snoop for maintaining the phys_P_Key_cache.
and generating events on the master only if a P_Key actually changed.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

54679e14

IB/mlx4: Initialize SR-IOV IB support for slaves in master context · fc06573d

由 Jack Morgenstein 提交于 8月 03, 2012

Allocate SR-IOV paravirtualization resources and MAD demuxing contexts
on the master.

This has two parts.  The first part is to initialize the structures to
contain the contexts.  This is done at master startup time in
mlx4_ib_init_sriov().

The second part is to actually create the tunneling resources required
on the master to support a slave.  This is performed the master
detects that a slave has started up (MLX4_DEV_EVENT_SLAVE_INIT event
generated when a slave initializes its comm channel).

For the master, there is no such startup event, so it creates its own
tunneling resources when it starts up.  In addition, the master also
creates the real special QPs.  The ib_core layer on the master causes
creation of proxy special QPs, since the master is also
paravirtualized at the ib_core layer.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

fc06573d

IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support · 1ffeb2eb

由 Jack Morgenstein 提交于 8月 03, 2012

1. Introduce the basic SR-IOV parvirtualization context objects for
   multiplexing and demultiplexing MADs.
2. Introduce support for the new proxy and tunnel QP types.

This patch introduces the objects required by the master for managing
QP paravirtualization for guests.

struct mlx4_ib_sriov is created by the master only.
It is a container for the following:

1. All the info required by the PPF to multiplex and de-multiplex MADs
   (including those from the PF). (struct mlx4_ib_demux_ctx demux)
2. All the info required to manage alias GUIDs (i.e., the GUID at
   index 0 that each guest perceives.  In fact, this is not the GUID
   which is actually at index 0, but is, in fact, the GUID which is at
   index[<VF number>] in the physical table.
3. structures which are used to manage CM paravirtualization
4. structures for managing the real special QPs when running in SR-IOV
   mode.  The real SQPs are controlled by the PPF in this case.  All
   SQPs created and controlled by the ib core layer are proxy SQP.

struct mlx4_ib_demux_ctx contains the information per port needed
to manage paravirtualization:

1. All multicast paravirt info
2. All tunnel-qp paravirt info for the port.
3. GUID-table and GUID-prefix for the port
4. work queues.

struct mlx4_ib_demux_pv_ctx contains all the info for managing the
paravirtualized QPs for one slave/port.

struct mlx4_ib_demux_pv_qp contains the info need to run an individual
QP (either tunnel qp or real SQP).

Note:  We made use of the 2 most significant bits in enum
mlx4_ib_qp_flags (based on enum ib_qp_create_flags in ib_verbs.h).
We need these bits in the low-level driver for internal purposes.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1ffeb2eb

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功