提交 · 320438301b85038e995b5a40a24c43cbc0ed4909 · openanolis / cloud-kernel

11 8月, 2017 1 次提交

Merge branches '32bit_lid' and 'irq_affinity' into k.o/merge-test · 32043830

由 Doug Ledford 提交于 8月 10, 2017

Conflicts:
	drivers/infiniband/hw/mlx5/main.c - Both add new code
	include/rdma/ib_verbs.h - Both add new code
Signed-off-by: NDoug Ledford <dledford@redhat.com>

32043830

09 8月, 2017 17 次提交

nvme-rdma: use intelligent affinity based queue mappings · 0b36658c

由 Sagi Grimberg 提交于 7月 13, 2017

Use the generic block layer affinity mapping helper. Also,
limit nr_hw_queues to the rdma device number of irq vectors
as we don't really need more.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0b36658c

block: Add rdma affinity based queue mapping helper · 24c5dc66

由 Sagi Grimberg 提交于 7月 13, 2017

Like pci and virtio, we add a rdma helper for affinity
spreading. This achieves optimal mq affinity assignments
according to the underlying rdma device affinity maps.
Reviewed-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

24c5dc66

mlx5: support ->get_vector_affinity · 40b24403

由 Sagi Grimberg 提交于 7月 13, 2017

Simply refer to the generic affinity mask helper.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

40b24403

RDMA/core: expose affinity mappings per completion vector · c66cd353

由 Sagi Grimberg 提交于 7月 13, 2017

This will allow ULPs to intelligently locate threads based
on completion vector cpu affinity mappings. In case the
driver does not expose a get_vector_affinity callout, return
NULL so the caller can maintain a fallback logic.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHåkon Bugge <haakon.bugge@oracle.com>
Acked-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c66cd353

mlx5: move affinity hints assignments to generic code · a435393a

由 Sagi Grimberg 提交于 7月 13, 2017

generic api takes care of spreading affinity similar to
what mlx5 open coded (and even handles better asymmetric
configurations). Ask the generic API to spread affinity
for us, and feed him pre_vectors that do not participate
in affinity settings (which is an improvement to what we
had before).

The affinity assignments should match what mlx5 tried to
do earlier but now we do not set affinity to async, cmd
and pages dedicated vectors.

Also, remove mlx5e_get_cpu and introduce mlx5e_get_node
(used for allocation purposes) and mlx5_get_vector_affinity
(for indirection table construction) as they provide the needed
information. Luckily, we have generic helpers to get cpumask
and node given a irq vector. mlx5_get_vector_affinity will
be used by mlx5_ib in a subsequent patch.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a435393a

mlx5e: don't assume anything on the irq affinity mappings of the device · a85e5474

由 Sagi Grimberg 提交于 7月 13, 2017

mlx5e currently assumes that irq affinity is really spread first
irq vectors across device home node cpus, with the new generic affinity
mappings this is no longer the case, hence mlxe should not rely on
this anymore.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a85e5474

mlx5: convert to generic pci_alloc_irq_vectors · 78249c42

由 Sagi Grimberg 提交于 7月 13, 2017

Now that we have a generic code to allocate an array
of irq vectors and even correctly spread their affinity,
correctly handle cpu hotplug events and more, were much
better off using it.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

78249c42

IB/CM: Set appropriate slid and dlid when handling CM request · ac3a949f

由 Dasaratharaman Chandramouli 提交于 6月 08, 2017

If extended LIDs are being used, a connection request contains
OPA GIDs in them. Extract the lids from the OPA gids and populate
slid/dlid fields in the path records that are created when handling
a connection request.
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ac3a949f

IB/CM: Create appropriate path records when handling CM request · 6b3c0e6e

由 Dasaratharaman Chandramouli 提交于 6月 08, 2017

When handling an incoming conection request, ib_cm creates
either an IB or an OPA path record based on the gid field
in the request.
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

6b3c0e6e

IB/CM: Add OPA Path record support to CM · e92aa00a

由 Hiatt, Don 提交于 6月 08, 2017

Add OPA path record support to the Connection Manager.
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e92aa00a

IB/core: Change wc.slid from 16 to 32 bits · 7db20ecd

由 Hiatt, Don 提交于 6月 08, 2017

slid field in struct ib_wc is increased to 32 bits.
This enables core components to use larger LIDs if needed.
The user ABI is unchanged and return 16 bit values when queried.
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7db20ecd

IB/core: Change port_attr.sm_lid from 16 to 32 bits · db58540b

由 Dasaratharaman Chandramouli 提交于 6月 08, 2017

sm_lid field in struct ib_port_attr is increased to 32 bits. This
enables core components to use larger LIDs if needed.
The user ABI is unchanged and return 16 bit values when queried.
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

db58540b

IB/core: Change port_attr.lid size from 16 to 32 bits · 582faf31

由 Dasaratharaman Chandramouli 提交于 6月 08, 2017

lid field in struct ib_port_attr is increased to 32 bits. This enables core
components to use larger LIDs if needed.
The user ABI is unchanged and return 16 bit values when queried.
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

582faf31

IB/mad: Change slid in RMPP recv from 16 to 32 bits · 1cb2fc0d

由 Dasaratharaman Chandramouli 提交于 6月 08, 2017

MAD RMPP contains slid field which is 16 bits in
length, increase it to 32 bits.
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

1cb2fc0d

IB/IPoIB: Increase local_lid to 32 bits · 7e93e2cb

由 Dasaratharaman Chandramouli 提交于 6月 08, 2017

IPoIB contains local_lid field which is 16 bits in
length, increase it to 32 bits.
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7e93e2cb

IB/srpt: Increase lid and sm_lid to 32 bits · 4c473690

由 Dasaratharaman Chandramouli 提交于 6月 08, 2017

srpt contains lid and sm_lid fields which are 16 bits in
length, increase them to 32 bits.
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDon Hiatt <don.hiatt@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4c473690

IB/core: Convert ah_attr from OPA to IB when copying to user · d541e455

由 Dasaratharaman Chandramouli 提交于 6月 08, 2017

OPA address handle atttibutes that have 32 bit LIDs would have to
be converted to IB address handle attribute with the LID field
programmed in the GID before copying to user space.
Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: NDon Hiatt <don.hiatt@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d541e455

01 8月, 2017 22 次提交

IB/hfi1: Always perform offline transition · 913cc671

由 Sebastian Sanchez 提交于 7月 29, 2017

Always initiate an offline transition request
when a link down occurs. The firmware will
use this request to confirm that the driver
has seen the link down message. A host version
is set to indicate this driver behavior to the
firmware.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

913cc671

IB/hfi1: Prevent link down request double queuing · 626c077c

由 Sebastian Sanchez 提交于 7月 29, 2017

When link interrupts occur, multiple link down requests
could be queued up when only one is needed. This could get
the hfi1 out of sync with its link partner during LNI.

Only allow one link down request to be queued at any one time.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

626c077c

IB/hfi1: Create workqueue for link events · 71d47008

由 Sebastian Sanchez 提交于 7月 29, 2017

Currently, link down interrupts queue link entries
on a workqueue intended for sending events only.
Create a workqueue for queuing link events.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

71d47008

IB/{rdmavt, hfi1, qib}: Fix panic with post receive and SGE compression · 3ffea7d8

由 Mike Marciniszyn 提交于 7月 29, 2017

The server side of qperf panics as follows:

[242446.336860] IP: report_bug+0x64/0x10
[242446.341031] PGD 1c0c067
[242446.341032] P4D 1c0c067
[242446.343951] PUD 1c0d063
[242446.346870] PMD 8587ea067
[242446.349788] PTE 800000083e14016
[242446.352901]
[242446.358352] Oops: 0003 [#1] SM
[242446.437919] CPU: 1 PID: 7442 Comm: irq/92-hfi1_0 k Not tainted 4.12.0-mam-asm #1
[242446.446365] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/201
[242446.458397] task: ffff8808392d2b80 task.stack: ffffc9000664000
[242446.465097] RIP: 0010:report_bug+0x64/0x10
[242446.469859] RSP: 0018:ffffc900066439c0 EFLAGS: 0001000
[242446.475784] RAX: ffffffffa06647e4 RBX: ffffffffa06461e1 RCX: 000000000000000
[242446.483840] RDX: 0000000000000907 RSI: ffffffffa0675040 RDI: ffffffffffff740
[242446.491897] RBP: ffffc900066439e0 R08: 0000000000000001 R09: 000000000000025
[242446.499953] R10: ffffffff81a253df R11: 0000000000000133 R12: ffffc90006643b3
[242446.508010] R13: ffffffffa065bbf0 R14: 00000000000001e5 R15: 000000000000000
[242446.516067] FS:  0000000000000000(0000) GS:ffff88085f640000(0000) knlGS:000000000000000
[242446.525191] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003
[242446.531698] CR2: ffffffffa06647ee CR3: 0000000001c09000 CR4: 00000000001406e
[242446.539756] Call Trace
[242446.542582]  fixup_bug+0x2c/0x5
[242446.546277]  do_trap+0x12b/0x18
[242446.549972]  do_error_trap+0x89/0x11
[242446.554171]  ? hfi1_copy_sge+0x271/0x2b0 [hfi1
[242446.559324]  ? ttwu_do_wakeup+0x1e/0x14
[242446.563795]  ? ttwu_do_activate+0x77/0x8
[242446.568363]  do_invalid_op+0x20/0x3
[242446.572448]  invalid_op+0x1e/0x3
[242446.576247] RIP: 0010:hfi1_copy_sge+0x271/0x2b0 [hfi1
[242446.582075] RSP: 0018:ffffc90006643be8 EFLAGS: 0001004
[242446.587999] RAX: 0000000000000000 RBX: ffff88083e0fa240 RCX: 000000000000000
[242446.596058] RDX: 0000000000000000 RSI: ffff880842508000 RDI: ffff88083e0fa24
[242446.604116] RBP: ffffc90006643c28 R08: 0000000000000000 R09: 000000000000000
[242446.612172] R10: ffffc90009473640 R11: 0000000000000133 R12: 000000000000000
[242446.620228] R13: 0000000000000000 R14: 0000000000002000 R15: ffff88084250800
[242446.628293]  ? hfi1_copy_sge+0x1a1/0x2b0 [hfi1
[242446.633449]  hfi1_rc_rcv+0x3da/0x1270 [hfi1
[242446.638312]  ? sc_buffer_alloc+0x113/0x150 [hfi1
[242446.643662]  hfi1_ib_rcv+0x1c9/0x2e0 [hfi1
[242446.648428]  process_receive_ib+0x19a/0x270 [hfi1
[242446.653866]  ? process_rcv_qp_work+0xd2/0x160 [hfi1
[242446.659505]  handle_receive_interrupt_nodma_rtail+0x184/0x2e0 [hfi1
[242446.666693]  ? irq_finalize_oneshot+0x100/0x10
[242446.671846]  receive_context_thread+0x1b/0x140 [hfi1
[242446.677576]  irq_thread_fn+0x1e/0x4
[242446.681659]  irq_thread+0x13c/0x1b
[242446.685646]  ? irq_forced_thread_fn+0x60/0x6
[242446.690604]  kthread+0x112/0x15
[242446.694298]  ? irq_thread_check_affinity+0xe0/0xe
[242446.699738]  ? kthread_park+0x60/0x6
[242446.703919]  ? do_syscall_64+0x67/0x15
[242446.708292]  ret_from_fork+0x25/0x3
[242446.712374] Code: 63 78 04 44 0f b7 70 08 41 89 d0 4c 8d 2c 38 41 83 e0 01 f6 c2 02 74 17 66 45 85 c0 74 11 f6 c2 04 b9 01 00 00 00 75 bb 83 ca 04 <66> 89 50 0a 66 45 85 c0 74 52 0f b6 48 0b 41 0f b7 f6 4d 89 e0
[242446.733527] RIP: report_bug+0x64/0x100 RSP: ffffc900066439c
[242446.739935] CR2: ffffffffa06647e
[242446.743763] ---[ end trace 0e90a20d0aa494f7 ]--

The root cause is that the qib/hfi1 post receive call to rvt_lkey_ok()
doesn't interpret the new return value from rvt_lkey_ok() properly
leading to an mr reference count underrun.

Additionally, remove an unused argument in rvt_sge_adjacent()
aw well as an unneeded incr local in rvt_post_one_wr().

Fixes: Commit 14fe13fc ("IB/rdmavt: Compress adjacent SGEs in rvt_lkey_ok()")
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3ffea7d8

IB/hfi1: Disambiguate corruption and uninitialized error cases · c822652e

由 Jan Sokolowski 提交于 7月 29, 2017

The error messages when checksum validation of the platform
configuration fields populated into the ASIC scratch registers fails are
ambiguous. Disambiguate them.
Reviewed-by: NJakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: NEaswar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: NJan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c822652e

IB/hfi1: Only set fd pointer when base context is completely initialized · e87473bc

由 Michael J. Ruhl 提交于 7月 29, 2017

The allocate_ctxt() function adds the context to the fd data structure.
Since the context is not completely initialized, this can cause confusion
as to whether the context is valid or not.

Move the fd reference from allocate_ctxt() to setup_base_ctxt().
Update the necessary functions to be aware of this move.
Reviewed-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e87473bc

IB/hfi1: Do not enable disabled port on cable insert · 96603ed8

由 Jan Sokolowski 提交于 7月 29, 2017

Fix issue where a disabled port can be enabled by
inserting a cable. The port should be explicitly
enabled instead.
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: NJakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: NJan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

96603ed8

IB/hfi1: Harden state transition to Armed and Active · 5efd40ca

由 Alex Estrin 提交于 7月 29, 2017

There is a window that allows other threads to read state of
'host_link_state' as a new, before the hardware actual state is set.
This patch closes the window by indicating a new state only after
hardware transition is complete.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NAlex Estrin <alex.estrin@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5efd40ca

IB/hfi1: Split copy_to_user data copy for better security · f13a6e5e

由 Michael J. Ruhl 提交于 7月 24, 2017

A copy_to_user() call assumes that two members of a data structure
are sequential.  Since this may not always be true, separate the copies
to ensure a safe copy.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f13a6e5e

IB/hfi1: Verify port data VLs credits on transition to Armed · 5e2d6764

由 Alex Estrin 提交于 7月 24, 2017

There is a window where the FM can read the buffer control table
and decide not to program buffers. When a port goes down, the code
clears the table and if it is not programmed, posted SDMA descriptors
will never complete due to no buffer credits.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NAlex Estrin <alex.estrin@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5e2d6764

IB/hfi1: Move saving PCI values to a separate function · a618b7e4

由 Bartlomiej Dudek 提交于 7月 24, 2017

During PCIe initialization some registers' values from
PCI config space are saved in order to restore them later
(i.e. after reset). Restoring those value is done by a
function called restore_pci_variables, while saving them
is put directly into function hfi1_pcie_ddinit.
Move saving values to a separate function in the image
of restoring functionality.
Reviewed-by: NJakub Byczkowski <jakub.byczkowski@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NBartlomiej Dudek <bartlomiej.dudek@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a618b7e4

IB/hfi1: Fix initialization failure for debug firmware · a156abb3

由 Byczkowski, Jakub 提交于 7月 24, 2017

Loading debug signed firmware fails if started immediately after
failed attempt to load production firmware. A short delay is
required so add about a 100us delay after RSA check failure.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NEaswar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a156abb3

IB/hfi1: Fix code consistency for if/else blocks in chip.c · 59ec8736

由 Jan Sokolowski 提交于 7月 24, 2017

Code structure is not consistent for if/else blocks and break
instructions in set_link_state for case HLS_UP_INIT. Physical
state uses break in case of an error and if/else blocks for
logical use cases. These blocks should be implemented consistently.
Reviewed-by: NJakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: NJan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

59ec8736

IB/hfi1: Send MAD traps until repressed · bf90aadd

由 Michael J. Ruhl 提交于 7月 24, 2017

A trap should be sent to the FM until the FM sends a repress message.
This is in line with the IBTA 13.4.9.

Add the ability to resend traps until a repress message is received.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NMichael N. Henry <michael.n.henry@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bf90aadd

IB/hfi1: Pass the context pointer rather than the index · 2250563e

由 Michael J. Ruhl 提交于 7月 24, 2017

The hfi1_rcvctrl() function receives an index which it then converts
to an rcd.  Since most functions have the rcd, use that instead.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2250563e

IB/hfi1: Use context pointer rather than context index · 17573972

由 Michael J. Ruhl 提交于 7月 24, 2017

The hfi1_<set|clear>_ctxt_<j|p>key functions take a context index and
look up the context based on that index.

Since the context index is being retrieved from the context, this
doesn't seem optimal.

Pass the context pointer for use, rather than the context index.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

17573972

IB/hfi1: Size rcd array index correctly and consistently · e6f7622d

由 Michael J. Ruhl 提交于 7月 24, 2017

The array index for the rcd array is sized several different ways
throughout the code.

Use the user interface size (u16) as the standard size and update the
necessary code to reflect this.

u16 is large enough for the largest amount of supported contexts.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e6f7622d

IB/hfi1: Remove unused user context data members · 91d970ab

由 Michael J. Ruhl 提交于 7月 24, 2017

Several data members of the user context have become unused over time.
Cleaning them up.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

91d970ab

IB/hfi1: Assign context does not clean up file descriptor correctly on error · 42492011

由 Michael J. Ruhl 提交于 7月 24, 2017

In the error path for context allocation, the file descriptor pointer
should not point to a context when an error occurs.

Clean up the appropriate references on error.

Fixes: Commit 62239fc6 ("IB/hfi1: Clean up on context initialization failure")
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

42492011

IB/hfi1: Serve the most starved iowait entry first · bcad2913

由 Kaike Wan 提交于 7月 24, 2017

When an egress resource(SDMA descriptors, pio credits) is not available,
a sending thread will be put on the resource's wait queue. When the
resource becomes available again, up to a fixed number of sending threads
can be awakened sequentially and removed from the wait queue, depending
on the number of waiting threads and the number of free resources. Since
each awakened sending thread will send as many packets as possible, it
is highly likely that the first sending thread will consume all the
egress resources. Subsequently, it will be put back to the end of the wait
queue. Depending on the timing when the later sending threads wake up,
they may not be able to send any packet and be again put back to the end
of the wait queue sequentially, right behind the first sending thread.
This starvation cycle continues until some sending threads exceed their
retry limit and consequently fail.

This patch fixes the issue by two simple approaches:
(1) Any starved sending thread will be put to the head of the wait queue
while a served sending thread will be put to the tail;
(2) The most starved sending thread will be served first.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bcad2913

IB/hfi1: Fix bar0 mapping to use write combining · cb51c5d2

由 Mike Marciniszyn 提交于 7月 24, 2017

When the debugpat kernel boot flag is turned on the following
traces are printed:

[ 1884.793168] x86/PAT: Overlap at 0x90000000-0x92000000
[ 1884.803510] x86/PAT: reserve_memtype added [mem 0x91200000-0x9127ffff],
track uncached-minus, req write-combining, ret uncached-minus
[ 1884.818167] hfi1 0000:05:00.0: hfi1_0: WC Remapped RcvArray:
ffffc9000a980000

The ioremap_wc() clearly is not returning a write combining mapping due
to an overlap where the RcvArray is mapped in a uncached mapping prior
to creating the proposed write combining mapping.

The patch replaces the single base register for uncached CSRs that
used to overlap the RcvArray with two mappings.   One, kregbase1, from the
bar0 up to the RcvArray and another, kregbase2, from the end of the
RcvArray to the pio send buffer space.  A new dd field, base2_start,
is used to convert the zero-based offset in the CSR routines to the
correct kregbase1/kregbase2 mapping.  A single direct write of the
RcvArray CSRs is replaced with hfi1_put_tid() to insure correct access
using the new disjoint mapping.

Additionally, the kregend field is deleted since it is only ever written.

patdebug now shows the RcvArray as write combining:
[   35.688990] x86/PAT: reserve_memtype added [mem 0x91200000-0x9127ffff],
track write-combining, req write-combining, ret write-combining

To insulate from any potential issues with write combining, all
writeq are now flushed in hfi1_put_tid() and rcv_array_wc_fill().
Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NAshutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

cb51c5d2

IB/hfi1: Check return values from PCI config API calls · c53df62c

由 Bartlomiej Dudek 提交于 6月 30, 2017

Ensure that return values from kernel PCI config access
API calls in HFI driver are checked and react properly if
they are not expected (i.e. not successful).
Reviewed-by: NJakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: NBartlomiej Dudek <bartlomiej.dudek@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c53df62c

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功