提交 · b810253bd9342f863a86ec7dfff4a5a7a0394d2f · openeuler / Kernel

28 6月, 2016 1 次提交

cxl: Add mechanism for delivering AFU driver specific events · b810253b

由 Philippe Bergheaud 提交于 6月 23, 2016

This adds an afu_driver_ops structure with fetch_event() and
event_delivered() callbacks. An AFU driver such as cxlflash can fill
this out and associate it with a context to enable passing custom AFU
specific events to userspace.

This also adds a new kernel API function cxl_context_pending_events(),
that the AFU driver can use to notify the cxl driver that new specific
events are ready to be delivered, and wake up anyone waiting on the
context wait queue.

The current count of AFU driver specific events is stored in the field
afu_driver_events of the context structure.

The cxl driver checks the afu_driver_events count during poll, select,
read, etc. calls to check if an AFU driver specific event is pending,
and calls fetch_event() to obtain and deliver that event. This way, the
cxl driver takes care of all the usual locking semantics around these
calls and handles all the generic cxl events, so that the AFU driver
only needs to worry about it's own events.

fetch_event() return a struct cxl_event_afu_driver_reserved, allocated
by the AFU driver, and filled in with the specific event information and
size. Total event size (header + data) should not be greater than
CXL_READ_MIN_SIZE (4K).

Th cxl driver prepends an appropriate cxl event header, copies the event
to userspace, and finally calls event_delivered() to return the status of
the operation to the AFU driver. The event is identified by the context
and cxl_event_afu_driver_reserved pointers.

Since AFU drivers provide their own means for userspace to obtain the
AFU file descriptor (i.e. cxlflash uses an ioctl on their scsi file
descriptor to obtain the AFU file descriptor) and the generic cxl driver
will never use this event, the ABI of the event is up to each individual
AFU driver.
Signed-off-by: NPhilippe Bergheaud <felix@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b810253b

16 6月, 2016 3 次提交

cxl: Add support for CAPP DMA mode · b385c9e9

由 Ian Munsie 提交于 6月 08, 2016

This adds support for using CAPP DMA mode, which is required for XSL
based cards such as the Mellanox CX4 to function.

This is currently an RFC as it depends on the corresponding support to
be merged into skiboot first, which was submitted here:
http://patchwork.ozlabs.org/patch/625582/

In the event that the skiboot on the system does not have the above
support, it will indicate as such in the kernel log and abort the init
process.
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b385c9e9

cxl: Abstract the differences between the PSL and XSL · 6d382616

由 Frederic Barrat 提交于 5月 24, 2016

The XSL (Translation Service Layer) is a stripped down version of the
PSL (Power Service Layer) used in some cards such as the Mellanox CX4.

Like the PSL, it implements the CAIA architecture, but has a number of
differences, mostly in it's implementation dependent registers. This
adds an ops structure to abstract these differences to bring initial
support for XSL CAPI devices.

The XSL does not implement the optional architected SERR register,
however while it treats it as a reserved register and should work with
no special treatment, attempting to access it will cause the XSL_FEC
(First Error Capture) register to be filled out, preventing it from
capturing any subsequent errors. Therefore, this patch also prevents the
kernel from trying to set up the SERR register so that the FEC register
may still be useful, and to save one interrupt.

The XSL also uses a special DMA cxl mode, which uses a slightly
different init sequence for the CAPP and PHB. The kernel support for
this will be in a future patch once the corresponding support has been
merged into skiboot.
Co-authored-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6d382616

cxl: Update process element after allocating interrupts · 292841b0

由 Ian Munsie 提交于 5月 24, 2016

In the kernel API, it is possible to attempt to allocate AFU interrupts
after already starting a context. Since the process element structure
used by the hardware is only filled out at the time the context is
started, it will not be updated with the interrupt numbers that have
just been allocated and therefore AFU interrupts will not work unless
they were allocated prior to starting the context.

This can present some difficulties as each CAPI enabled PCI device in
the kernel API has a default context, which may need to be started very
early to enable translations, potentially before interrupts can easily
be set up.

This patch makes the API more flexible to allow interrupts to be
allocated after a context has already been started and takes care of
updating the PE structure used by the hardware and notifying it to
discard any cached copy it may have.

The update is currently performed via a terminate/remove/add sequence.
This is necessary on some hardware such as the XSL that does not
properly support the update LLCMD.

Note that this is only supported on powernv at present - attempting to
perform this ordering on PowerVM will raise a warning.
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Reviewed-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

292841b0

11 5月, 2016 3 次提交

cxl: Check periodically the coherent platform function's state · 266eab8f

由 Christophe Lombard 提交于 4月 22, 2016

In the PowerVM environment, the PHYP CoherentAccel component manages
the state of the Coherent Accelerator Processor Interface adapter and
virtualizes CAPI resources, handles CAPP, PSL, PSL Slice errors - and
interrupts - and provides a new set of hcalls for the OS APIs to utilize
Accelerator Function Unit (AFU).

During the course of operation, a coherent platform function can
encounter errors. Some possible reason for errors are:
• Hardware recoverable and unrecoverable errors
• Transient and over-threshold correctable errors

PHYP implements its own state model for the coherent platform function.
The state of the AFU is available through a hcall.

The current implementation of the cxl driver, for the PowerVM
environment, checks this state of the AFU only when an action is
requested - open a device, ioctl command, memory map, attach/detach a
process - from an external driver - cxlflash, libcxl. If an error is
detected the cxl driver handles the error according the content of the
Power Architecture Platform Requirements document.

But in case of low-level troubles (or error injection), the PHYP
component may reset the card and change the AFU state. The PHYP
interface doesn't provide any way to be notified when that happens thus
implies that the cxl driver:
• cannot handle immediatly the state change of the AFU.
• cannot notify other drivers (cxlflash, ...)

The purpose of this patch is to wake up the cpu periodically to check
the current state of each AFU and to see if we need to enter an error
recovery path.
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

266eab8f

cxl: Add kernel API to allow a context to operate with relocate disabled · 7a0d85d3

由 Ian Munsie 提交于 5月 06, 2016

cxl devices typically access memory using an MMU in much the same way as
the CPU, and each context includes a state register much like the MSR in
the CPU. Like the CPU, the state register includes a bit to enable
relocation, which we currently always enable.

In some cases, it may be desirable to allow a device to access memory
using real addresses instead of effective addresses, so this adds a new
API, cxl_set_translation_mode, that can be used to disable relocation
on a given kernel context. This can allow for the creation of a special
privileged context that the device can use if it needs relocation
disabled, and can use regular contexts at times when it needs relocation
enabled.

This interface is only available to users of the kernel API for obvious
reasons, and will never be supported in a virtualised environment.

This will be used by the upcoming cxl support in the mlx5 driver.
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7a0d85d3

cxl: Remove duplicate #defines · 0e5b5ba1

由 Ian Munsie 提交于 5月 04, 2016

These defines are not used, but other equivalent definitions
(CXL_SPA_SW_CMD_*) are used. Remove the unused defines.
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0e5b5ba1

27 4月, 2016 1 次提交

cxl: Poll for outstanding IRQs when detaching a context · 2bc79ffc

由 Michael Neuling 提交于 4月 22, 2016

When detaching contexts, we may still have interrupts in the system
which are yet to be delivered to any CPU and be acked in the PSL.
This can result in a subsequent unrelated process getting an spurious
IRQ or an interrupt for a non-existent context.

This polls the PSL to ensure that the PSL is clear of IRQs for the
detached context, before removing the context from the idr.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Tested-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Tested-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2bc79ffc

22 4月, 2016 1 次提交

cxl: Allow initialization on timebase sync failures · e009a7e8

由 Frederic Barrat 提交于 3月 21, 2016

Failure to synchronize the PSL timebase currently prevents the
initialization of the cxl card, thus rendering the card useless. This
is too extreme for a feature which is rarely used, if at all. No
hardware AFUs or software is currently using PSL timebase.

This patch still tries to synchronize the PSL timebase when the card
is initialized, but ignores the error if it can't. Instead, it reports
a status via /sys.
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e009a7e8

09 3月, 2016 14 次提交

cxl: Ignore probes for virtual afu pci devices · 17eb3eef

由 Vaibhav Jain 提交于 2月 29, 2016

Add a check at the beginning of cxl_probe function to ignore virtual pci
devices created for each afu registered. This fixes the the errors
messages logged about missing CXL vsec, when cxl probe is unable to
find necessary vsec entries in device pci config space. The error
message logged are of the form :

cxl-pci 0004:00:00.0: ABORTING: CXL VSEC not found!
cxl-pci 0004:00:00.0: cxl_init_adapter failed: -19

Cc: Ian Munsie <imunsie@au1.ibm.com>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: fbarrat@linux.vnet.ibm.com
Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

17eb3eef

cxl: Adapter failure handling · 0d400f77

由 Christophe Lombard 提交于 3月 04, 2016

Check the AFU state whenever an API is called. The hypervisor may
issue a reset of the adapter when it detects a fault. When it happens,
it launches an error recovery which will either move the AFU to a
permanent failure state, or in the disabled state.
If the AFU is found to be disabled, detach all existing contexts from
it before issuing a AFU reset to re-enable it.

Before detaching contexts, notify any kernel driver through the EEH
callbacks of the AFU pci device.
Co-authored-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0d400f77

cxl: Support the cxl kernel API from a guest · d601ea91

由 Frederic Barrat 提交于 3月 04, 2016

Like on bare-metal, the cxl driver creates a virtual PHB and a pci
device for the AFU. The configuration space of the device is mapped to
the configuration record of the AFU.

Reuse the code defined in afu_cr_read8|16|32() when reading the
configuration space of the AFU device.

Even though the (virtual) AFU device is a pci device, the adapter is
not. So a driver using the cxl kernel API cannot read the VPD of the
adapter through the usual PCI interface. Therefore, we add a call to
the cxl kernel API:
ssize_t cxl_read_adapter_vpd(struct pci_dev *dev, void *buf, size_t count);
Co-authored-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d601ea91

cxl: Support to flash a new image on the adapter from a guest · 594ff7d0

由 Christophe Lombard 提交于 3月 04, 2016

The new flash.c file contains the logic to flash a new image on the
adapter, through a hcall. It is an iterative process, with chunks of
data of 1M at a time. There are also 2 phases: write and verify. The
flash operation itself is driven from a user-land tool.
Once flashing is successful, an rtas call is made to update the device
tree with the new properties values for the adapter and the AFU(s)

Add a new char device for the adapter, so that the flash tool can
access the card, even if there is no valid AFU on it.
Co-authored-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

594ff7d0

cxl: sysfs support for guests · 4752876c

由 Christophe Lombard 提交于 3月 04, 2016

Filter out a few adapter parameters which don't make sense in a guest.
Document the changes.
Co-authored-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4752876c

cxl: Add guest-specific code · 14baf4d9

由 Christophe Lombard 提交于 3月 04, 2016

The new of.c file contains code to parse the device tree to find out
about cxl adapters and AFUs.

guest.c implements the guest-specific callbacks for the backend API.

The process element ID is not known until the context is attached, so
we have to separate the context ID assigned by the cxl driver from the
process element ID visible to the user applications. In bare-metal,
the 2 IDs match.
Co-authored-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
[mpe: Fix SMP=n build, fix PSERIES=n build, minor whitespace fixes]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

14baf4d9

cxl: Separate bare-metal fields in adapter and AFU data structures · cbffa3a5

由 Christophe Lombard 提交于 3月 04, 2016

Introduce sub-structures containing the bare-metal specific fields in
the structures describing the adapter (struct cxl) and AFU (struct
cxl_afu).
Update all their references.
Co-authored-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

cbffa3a5

cxl: New hcalls to support cxl adapters · 444c4ba4

由 Christophe Lombard 提交于 3月 04, 2016

The hypervisor calls provide an interface with a coherent platform
facility and function. It matches version 0.16 of the 'PAPR changes'
document.

The following hcalls are supported:
H_ATTACH_CA_PROCESS    Attach a process element to a coherent platform
                       function.
H_DETACH_CA_PROCESS    Detach a process element from a coherent
                       platform function.
H_CONTROL_CA_FUNCTION  Allow the partition to manipulate or query
                       certain coherent platform function behaviors.
H_COLLECT_CA_INT_INFO  Collect interrupt info about a coherent.
                       platform function after an interrupt occurred
H_CONTROL_CA_FAULTS    Control the operation of a coherent platform
                       function after a fault occurs.
H_DOWNLOAD_CA_FACILITY Support for downloading a base adapter image to
                       the coherent platform facility, and for
                       validating the entire image after the download.
H_CONTROL_CA_FACILITY  Allow the partition to manipulate or query
                       certain coherent platform facility behaviors.
Co-authored-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

444c4ba4

cxl: Update cxl_irq() prototype · 6d625ed9

由 Frederic Barrat 提交于 3月 04, 2016

The context parameter when calling cxl_irq() should be strongly typed.
Co-authored-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6d625ed9

cxl: Isolate a few bare-metal-specific calls · ea2d1f95

由 Frederic Barrat 提交于 3月 04, 2016

A few functions are mostly common between bare-metal and guest and
just need minor tuning. To avoid crowding the backend API, introduce a
few 'if' based on the CPU being in HV mode.
Co-authored-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ea2d1f95

cxl: Rename some bare-metal specific functions · 2b04cf31

由 Frederic Barrat 提交于 3月 04, 2016

Rename a few functions, changing the 'cxl_' prefix to either
'cxl_pci_' or 'cxl_native_', to make clear that the implementation is
bare-metal specific.

Those functions will have an equivalent implementation for a guest in
a later patch.
Co-authored-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2b04cf31

cxl: Introduce implementation-specific API · 5be587b1

由 Frederic Barrat 提交于 3月 04, 2016

The backend API (in cxl.h) lists some low-level functions whose
implementation is different on bare-metal and in a guest. Each
environment implements its own functions, and the common code uses
them through function pointers, defined in cxl_backend_ops
Co-authored-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5be587b1

cxl: Move bare-metal specific code to specialized files · d56d301b

由 Frederic Barrat 提交于 3月 04, 2016

Move a few functions around to better separate code specific to
bare-metal environment from code which will be commonly used between
guest and bare-metal.

Code specific to bare-metal is meant to be in native.c or pci.c
only. It's basically anything which touches the card p1 registers,
some p2 registers not needed from a guest and the PCI interface.
Co-authored-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d56d301b

cxl: Move common code away from bare-metal-specific files · 86331862

由 Christophe Lombard 提交于 3月 04, 2016

Move around some functions which will be accessed from the bare-metal
and guest environments.
Code in native.c and pci.c is meant to be bare-metal specific.
Other files contain code which may be shared with guests.
Co-authored-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

86331862

05 1月, 2016 1 次提交

cxl: Fix DSI misses when the context owning task exits · 7b8ad495

由 Vaibhav Jain 提交于 11月 24, 2015

Presently when a user-space process issues CXL_IOCTL_START_WORK ioctl we
store the pid of the current task_struct and use it to get pointer to
the mm_struct of the process, while processing page or segment faults
from the capi card. However this causes issues when the thread that had
originally issued the start-work ioctl exits in which case the stored
pid is no more valid and the cxl driver is unable to handle faults as
the mm_struct corresponding to process is no more accessible.

This patch fixes this issue by using the mm_struct of the next alive
task in the thread group. This is done by iterating over all the tasks
in the thread group starting from thread group leader and calling
get_task_mm on each one of them. When a valid mm_struct is obtained the
pid of the associated task is stored in the context replacing the
exiting one for handling future faults.

The patch introduces a new function named get_mem_context that checks if
the current task pointed to by ctx->pid is dead? If yes it performs the
steps described above. Also a new variable cxl_context.glpid is
introduced which stores the pid of the thread group leader associated
with the context owning task.
Reported-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reported-by: NFrank Haverkamp <HAVERKAM@de.ibm.com>
Suggested-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Reviewed-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7b8ad495

24 11月, 2015 1 次提交

cxl: Fix possible idr warning when contexts are released · 1b5df59e

由 Vaibhav Jain 提交于 11月 16, 2015

An idr warning is reported when a context is release after the capi card
is unbound from the cxl driver via sysfs. Below are the steps to
reproduce:

1. Create multiple afu contexts in an user-space application using libcxl.
2. Unbind capi card from cxl using command of form
   echo <capi-card-pci-addr> > /sys/bus/pci/drivers/cxl-pci/unbind
3. Exit/kill the application owning afu contexts.

After above steps a warning message is usually seen in the kernel logs
of the form "idr_remove called for id=<context-id> which is not
allocated."

This is caused by the function cxl_release_afu which destroys the
contexts_idr table. So when a context is release no entry for context pe
is found in the contexts_idr table and idr code prints this warning.

This patch fixes this issue by increasing & decreasing the ref-count on
the afu device when a context is initialized or when its freed
respectively. This prevents the afu from being released until all the
afu contexts have been released. The patch introduces two new functions
namely cxl_afu_get/put that manage the ref-count on the afu device.

Also the patch removes code inside cxl_dev_context_init that increases ref
on the afu device as its guaranteed to be alive during this function.
Reported-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1b5df59e

01 10月, 2015 1 次提交

cxl: fix leak of IRQ names in cxl_free_afu_irqs() · 8dde152e

由 Andrew Donnellan 提交于 9月 30, 2015

cxl_free_afu_irqs() doesn't free IRQ names when it releases an AFU's IRQ
ranges. The userspace API equivalent in afu_release_irqs() calls
afu_irq_name_free() to release the IRQ names.

Call afu_irq_name_free() in cxl_free_afu_irqs() to release the IRQ names.
Make afu_irq_name_free() non-static to allow this.
Reported-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Fixes: 6f7f0b3d ("cxl: Add AFU virtual PHB and kernel API")
Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Reviewed-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8dde152e

30 8月, 2015 2 次提交

cxl: Set up and enable PSL Timebase · 390fd592

由 Philippe Bergheaud 提交于 8月 28, 2015

This patch configures the PSL Timebase function and enables it,
after the CAPP has been initialized by OPAL.
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NPhilippe Bergheaud <felix@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

390fd592

cxl: Fix force unmapping mmaps of contexts allocated through the kernel api · 55e07668

由 Ian Munsie 提交于 8月 27, 2015

The cxl user api uses the address_space associated with the file when we
need to force unmap all cxl mmap regions (e.g. on eeh, driver detach,
etc). Currently, contexts allocated through the kernel api do not do
this and instead skip the mmap invalidation, potentially allowing them
to poke at the hardware after such an event, which may cause all sorts
of trouble.

This patch allocates an address_space for cxl contexts allocated through
the kernel api so that the same invalidate path will for these contexts
as well. We don't use the anonymous inode's address_space, as doing so
could invalidate any mmaps of completely unrelated drivers using
anonymous file descriptors.

This patch also introduces a kernelapi flag, so we know when freeing the
context if the address_space was allocated by us and needs to be freed.
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

55e07668

18 8月, 2015 1 次提交

cxl: Add alternate MMIO error handling · d9232a3d

由 Ian Munsie 提交于 7月 23, 2015

userspace programs using cxl currently have to use two strategies for
dealing with MMIO errors simultaneously. They have to check every read
for a return of all Fs in case the adapter has gone away and the kernel
has not yet noticed, and they have to deal with SIGBUS in case the
kernel has already noticed, invalidated the mapping and marked the
context as failed.

In order to simplify things, this patch adds an alternative approach
where the kernel will return a page filled with Fs instead of delivering
a SIGBUS. This allows userspace to only need to deal with one of these
two error paths, and is intended for use in libraries that use cxl
transparently and may not be able to safely install a signal handler.

This approach will only work if certain constraints are met. Namely, if
the application is both reading and writing to an address in the problem
state area it cannot assume that a non-FF read is OK, as it may just be
reading out a value it has previously written. Further - since only one
page is used per context a write to a given offset would be visible when
reading the same offset from a different page in the mapping (this only
applies within a single context, not between contexts).

An application could deal with this by e.g. making sure it also reads
from a read-only offset after any reads to a read/write offset.

Due to these constraints, this functionality must be explicitly
requested by userspace when starting the context by passing in the
CXL_START_WORK_ERR_FF flag.
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Acked-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d9232a3d

14 8月, 2015 5 次提交

cxl: EEH support · 9e8df8a2

由 Daniel Axtens 提交于 8月 14, 2015

EEH (Enhanced Error Handling) allows a driver to recover from the
temporary failure of an attached PCI card. Enable basic CXL support
for EEH.
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9e8df8a2

cxl: Allow the kernel to trust that an image won't change on PERST. · 13e68d8b

由 Daniel Axtens 提交于 8月 14, 2015

Provide a kernel API and a sysfs entry which allow a user to specify
that when a card is PERSTed, it's image will stay the same, allowing
it to participate in EEH.

cxl_reset is used to reflash the card. In that case, we cannot safely
assert that the image will not change. Therefore, disallow cxl_reset
if the flag is set.
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

13e68d8b

cxl: Allocate and release the SPA with the AFU · 05155772

由 Daniel Axtens 提交于 8月 14, 2015

Previously the SPA was allocated and freed upon entering and leaving
AFU-directed mode. This causes some issues for error recovery - contexts
hold a pointer inside the SPA, and they may persist after the AFU has
been detached.

We would ideally like to allocate the SPA when the AFU is allocated, and
release it until the AFU is released. However, we don't know how big the
SPA needs to be until we read the AFU descriptor.

Therefore, restructure the code:

 - Allocate the SPA only once, on the first attach.

 - Release the SPA only when the entire AFU is being released (not
   detached). Guard the release with a NULL check, so we don't free
   if it was never allocated (e.g. dedicated mode)
Acked-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

05155772

cxl: Drop commands if the PCI channel is not in normal state · 0b3f9c75

由 Daniel Axtens 提交于 8月 14, 2015

If the PCI channel has gone down, don't attempt to poke the hardware.

We need to guard every time cxl_whatever_(read|write) is called. This
is because a call to those functions will dereference an offset into an
mmio register, and the mmio mappings get invalidated in the EEH
teardown.

Check in the read/write functions in the header.
We give them the same semantics as usual PCI operations:
 - a write to a channel that is down is ignored.
 - a read from a channel that is down returns all fs.

Also, we try to access the MMIO space of a vPHB device as part of the
PCI disable path. Because that's a read that bypasses most of our usual
checks, we handle it explicitly.

As far as user visible warnings go:
 - Check link state in file ops, return -EIO if down.
 - Be reasonably quiet if there's an error in a teardown path,
   or when we already know the hardware is going down.
 - Throw a big WARN if someone tries to start a CXL operation
   while the card is down. This gives a useful stacktrace for
   debugging whatever is doing that.
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0b3f9c75

cxl: Convert MMIO read/write macros to inline functions · 588b34be

由 Daniel Axtens 提交于 8月 14, 2015

We're about to make these more complex, so make them functions
first.
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

588b34be

03 6月, 2015 6 次提交

cxl: Add AFU virtual PHB and kernel API · 6f7f0b3d

由 Michael Neuling 提交于 5月 27, 2015

This patch does two things.

Firstly it presents the Accelerator Function Unit (AFUs) behind the POWER
Service Layer (PSL) as PCI devices on a virtual PCI Host Bridge (vPHB). This
in in addition to the PSL being a PCI device itself.

As part of the Coherent Accelerator Interface Architecture (CAIA) AFUs can
provide an AFU configuration. This AFU configuration recored is architected to
be the same as a PCI config space.

This patch sets discovers the AFU configuration records, provides AFU config
space read/write functions to these configuration records. It then enumerates
the PCI bus. It also hooks in PCI ops where appropriate. It also destroys the
vPHB when the physical card is removed.

Secondly, it add an in kernel API for AFU to use CXL. AFUs must present a
driver that firstly binds as a PCI device. This PCI device can then be using
to do CXL specific operations (that can't sit in the PCI ops) using this API.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6f7f0b3d

cxl: Export file ops for use by API · 0520336a

由 Michael Neuling 提交于 5月 27, 2015

The cxl kernel API will allow drivers other than cxl to export a file
descriptor which has the same userspace API.  These file descriptors will be
able to be used against libcxl.

This exports those file ops for use by other drivers.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0520336a

cxl: Move include file cxl.h -> cxl-base.h · ec249dd8

由 Michael Neuling 提交于 5月 27, 2015

This moves the current include file from cxl.h -> cxl-base.h.  This current
include file is used only to pass information between the base driver that
needs to be built into the kernel and the cxl module.

This is to make way for a new include/misc/cxl.h which will
contain just the kernel API for other driver to use
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ec249dd8

cxl: Split afu_register_irqs() function · c358d84b

由 Michael Neuling 提交于 5月 27, 2015

Split the afu_register_irqs() function so that different parts can
be useful elsewhere.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c358d84b

cxl: Export some symbols · 1a1a94b8

由 Michael Neuling 提交于 5月 27, 2015

Export some symbols which will soon be used elsewhere in this driver.

Now they are global we rename them so to avoid collisions.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1a1a94b8

cxl: cxl_afu_reset() -> __cxl_afu_reset() · b12994fb

由 Michael Neuling 提交于 5月 27, 2015

Rename cxl_afu_reset() to __cxl_afu_reset() to we can reuse this function name
in the API.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b12994fb

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功