提交 · 79dbddaf8e9613a529d1b07f181f07894bf6fb8e · bug2833 / cloud-kernel

08 12月, 2015 1 次提交

cxl: Set endianess of kernel contexts · e606e035

由 Frederic Barrat 提交于 12月 07, 2015

A process element (defined in CAIA) keeps track of the endianess of
contexts through the Little Endian (LE) bit of the State Register. It
is currently set for user contexts, but was somehow forgotten for
kernel contexts, so this patch fixes it.
It could lead to erratic behavior from an AFU when the context is
attached through the kernel API.

Fixes: 2f663527 ("cxl: Configure PSL for kernel contexts and merge code")
Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Suggested-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e606e035

07 10月, 2015 1 次提交

cxl: Fix number of allocated pages in SPA · 4108efb0

由 Christophe Lombard 提交于 10月 07, 2015

The scheduled process area is currently allocated before assigning the
correct maximum processes to the AFU, which will mean we only ever
allocate a fixed number of pages for the scheduled process area. This
will limit us to 958 processes with 2 x 64K pages. If we try to use more
processes than that we'd probably overrun the buffer and corrupt memory
or crash.

AFUs that require three or more interrupts per process will not be
affected as they are already limited to less processes than that, but we
could hit it on an AFU that requires 0, 1 or 2 interrupts per process,
or when using 4K pages.

This patch moves the initialisation of the num_procs to before the SPA
allocation so that enough pages will be allocated for the number of
processes that the AFU supports.
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Cc: <stable@vger.kernel.org> # 3.18+
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4108efb0

14 8月, 2015 2 次提交

cxl: Allocate and release the SPA with the AFU · 05155772

由 Daniel Axtens 提交于 8月 14, 2015

Previously the SPA was allocated and freed upon entering and leaving
AFU-directed mode. This causes some issues for error recovery - contexts
hold a pointer inside the SPA, and they may persist after the AFU has
been detached.

We would ideally like to allocate the SPA when the AFU is allocated, and
release it until the AFU is released. However, we don't know how big the
SPA needs to be until we read the AFU descriptor.

Therefore, restructure the code:

 - Allocate the SPA only once, on the first attach.

 - Release the SPA only when the entire AFU is being released (not
   detached). Guard the release with a NULL check, so we don't free
   if it was never allocated (e.g. dedicated mode)
Acked-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

05155772

cxl: Drop commands if the PCI channel is not in normal state · 0b3f9c75

由 Daniel Axtens 提交于 8月 14, 2015

If the PCI channel has gone down, don't attempt to poke the hardware.

We need to guard every time cxl_whatever_(read|write) is called. This
is because a call to those functions will dereference an offset into an
mmio register, and the mmio mappings get invalidated in the EEH
teardown.

Check in the read/write functions in the header.
We give them the same semantics as usual PCI operations:
 - a write to a channel that is down is ignored.
 - a read from a channel that is down returns all fs.

Also, we try to access the MMIO space of a vPHB device as part of the
PCI disable path. Because that's a read that bypasses most of our usual
checks, we handle it explicitly.

As far as user visible warnings go:
 - Check link state in file ops, return -EIO if down.
 - Be reasonably quiet if there's an error in a teardown path,
   or when we already know the hardware is going down.
 - Throw a big WARN if someone tries to start a CXL operation
   while the card is down. This gives a useful stacktrace for
   debugging whatever is doing that.
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0b3f9c75

06 8月, 2015 1 次提交

cxl: Don't ignore add_process_element() result when attaching context · 368857c1

由 Daniel Axtens 提交于 7月 29, 2015

Currently when attaching a context in dedicated mode, we ignore the
result of add_process_element(), which could potentially fail.

If add_process_element() returns an error, pass it back to the caller.
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

368857c1

13 7月, 2015 1 次提交

cxl: use more common format specifier · de369538

由 Rasmus Villemoes 提交于 6月 11, 2015

A precision of 16 (%.16llx) has the same effect as a field width of 16
along with passing the 0 flag (%016llx), but the latter is much more
common in the kernel tree. Update cxl to use that.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

de369538

03 6月, 2015 4 次提交

cxl: Move include file cxl.h -> cxl-base.h · ec249dd8

由 Michael Neuling 提交于 5月 27, 2015

This moves the current include file from cxl.h -> cxl-base.h.  This current
include file is used only to pass information between the base driver that
needs to be built into the kernel and the cxl module.

This is to make way for a new include/misc/cxl.h which will
contain just the kernel API for other driver to use
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ec249dd8

cxl: Configure PSL for kernel contexts and merge code · 2f663527

由 Michael Neuling 提交于 5月 27, 2015

This updates AFU directed and dedicated modes for contexts attached to the
kernel.

The SR (similar to the MSR in the core) calculation is getting
quite complex and is duplicated in AFU directed and dedicated
modes.  This patch also merges this SR calculation for these modes.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2f663527

cxl: Export some symbols · 1a1a94b8

由 Michael Neuling 提交于 5月 27, 2015

Export some symbols which will soon be used elsewhere in this driver.

Now they are global we rename them so to avoid collisions.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1a1a94b8

cxl: cxl_afu_reset() -> __cxl_afu_reset() · b12994fb

由 Michael Neuling 提交于 5月 27, 2015

Rename cxl_afu_reset() to __cxl_afu_reset() to we can reuse this function name
in the API.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b12994fb

22 1月, 2015 1 次提交

cxl: Add tracepoints · 9bcf28cd

由 Ian Munsie 提交于 1月 09, 2015

This patch adds tracepoints throughout the cxl driver, which can provide
insight into:

- Context lifetimes
- Commands sent to the PSL and AFU and their completion status
- Segment and page table misses and their resolution
- PSL and AFU interrupts
- slbia calls from the powerpc copro_fault code

These tracepoints are mostly intended to aid in debugging (particularly
for new AFU designs), and may be useful standalone or in conjunction
with hardware traces collected by the PSL (read out via the trace
interface in debugfs) and AFUs.
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9bcf28cd

29 12月, 2014 1 次提交

cxl: Disable SPAP register when freeing SPA · db7933f3

由 Ian Munsie 提交于 12月 08, 2014

When we deactivate the AFU directed mode we free the scheduled process
area, but did not clear the register in the hardware that has a pointer
to it.

This should be fine since we will have already cleared out every context
and we won't do anything that would cause the hardware to access it
until after we have allocated a new one, but just to be safe this patch
clears out the register when we free the page.
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

db7933f3

12 12月, 2014 2 次提交

cxl: Add timeout to process element commands · a98e6e9f

由 Ian Munsie 提交于 12月 08, 2014

In the event that something goes wrong in the hardware and it is unable
to complete a process element comment we would end up polling forever,
effectively making the associated process unkillable.

This patch adds a timeout to the process element command code path, so
that we will give up if the hardware does not respond in a reasonable
time.

Cc: stable@vger.kernel.org
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a98e6e9f

cxl: Change contexts_lock to a mutex to fix sleep while atomic bug · ee41d11d

由 Ian Munsie 提交于 12月 08, 2014

We had a known sleep while atomic bug if a CXL device was forcefully
unbound while it was in use. This could occur as a result of EEH, or
manually induced with something like this while the device was in use:

echo 0000:01:00.0 > /sys/bus/pci/drivers/cxl-pci/unbind

The issue was that in this code path we iterated over each context and
forcefully detached it with the contexts_lock spin lock held, however
the detach also needed to take the spu_mutex, and call schedule.

This patch changes the contexts_lock to a mutex so that we are not in
atomic context while doing the detach, thereby avoiding the sleep while
atomic.

Also delete the related TODO comment, which suggested an alternate
solution which turned out to not be workable.

Cc: stable@vger.kernel.org
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ee41d11d

27 11月, 2014 1 次提交

CXL: Return error to PSL if IRQ demultiplexing fails & print clearer warning · 27bbcef2

由 Ian Munsie 提交于 11月 14, 2014

If an AFU has a hardware bug that causes it to acknowledge a context
terminate or remove while that context has outstanding transactions, it
is possible for the kernel to receive an interrupt for that context
after we have removed it from the context list.

The kernel will not be able to demultiplex the interrupt (or worse - if
we have already reallocated the process handle we could mis-attribute it
to the new context), and printed a big scary warning.

It did not acknowledge the interrupt, which would effectively halt
further translation fault processing on the PSL.

This patch makes the warning clearer about the likely cause of the issue
(i.e. hardware bug) to make it obvious to future AFU designers of what
needs to be fixed. It also prints out the process handle which can then
be matched up with hardware and software traces for debugging.

It also acknowledges the interrupt to the PSL with either an address
error or acknowledge, so that the PSL can continue with other
translations.
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

27bbcef2

18 11月, 2014 1 次提交

cxl: Return error to PSL if IRQ demultiplexing fails & print clearer warning · bc78b05b

由 Ian Munsie 提交于 11月 14, 2014

If an AFU has a hardware bug that causes it to acknowledge a context
terminate or remove while that context has outstanding transactions, it
is possible for the kernel to receive an interrupt for that context
after we have removed it from the context list.

The kernel will not be able to demultiplex the interrupt (or worse - if
we have already reallocated the process handle we could mis-attribute it
to the new context), and printed a big scary warning.

It did not acknowledge the interrupt, which would effectively halt
further translation fault processing on the PSL.

This patch makes the warning clearer about the likely cause of the issue
(i.e. hardware bug) to make it obvious to future AFU designers of what
needs to be fixed. It also prints out the process handle which can then
be matched up with hardware and software traces for debugging.

It also acknowledges the interrupt to the PSL with either an address
error or acknowledge, so that the PSL can continue with other
translations.
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bc78b05b

28 10月, 2014 1 次提交

cxl: Disable secondary hash in segment table · 5100a9d6

由 Ian Munsie 提交于 10月 28, 2014

This patch simplifies the process of finding a free segment table entry
by disabling the secondary hash. This reduces the number of possible
entries in the segment table for a given address from 16 to 8.

Due to the large segment sizes we use it is extremely unlikely that the
secondary hash would ever have been used in practice, so this should not
have any negative impacts and may even improve performance due to the
reduced number of comparisons that software & hardware need to perform.

This patch clears the SC bit in the hardware's state register
(CXL_PSL_SR_An) to disable the secondary hash in the hardware since we
can no longer fill out entries using it.
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5100a9d6

08 10月, 2014 1 次提交

cxl: Driver code for powernv PCIe based cards for userspace access · f204e0b8

由 Ian Munsie 提交于 10月 08, 2014

This is the core of the cxl driver.

It adds support for using cxl cards in the powernv environment only (ie POWER8
bare metal). It allows access to cxl accelerators by userspace using the
/dev/cxl/afuM.N char devices.

The kernel driver has no knowledge of the function implemented by the
accelerator. It provides services to userspace via the /dev/cxl/afuM.N
devices. When a program opens this device and runs the start work IOCTL, the
accelerator will have coherent access to that processes memory using the same
virtual addresses. That process may mmap the device to access any MMIO space
the accelerator provides.  Also, reads on the device will allow interrupts to
be received. These services are further documented in a later patch in
Documentation/powerpc/cxl.txt.

Documentation of the cxl hardware architecture and userspace API is provided in
subsequent patches.
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f204e0b8

bug2833 / cloud-kernel 与 Fork 源项目一致

bug2833 / cloud-kernel
与 Fork 源项目一致