提交 · 05c6cfb9dce0d13d37e9d007ee6a4af36f1c0a58 · openanolis / cloud-kernel

11 6月, 2015 6 次提交

powerpc/iommu/powernv: Release replaced TCE · 05c6cfb9

由 Alexey Kardashevskiy 提交于 6月 05, 2015

At the moment writing new TCE value to the IOMMU table fails with EBUSY
if there is a valid entry already. However PAPR specification allows
the guest to write new TCE value without clearing it first.

Another problem this patch is addressing is the use of pool locks for
external IOMMU users such as VFIO. The pool locks are to protect
DMA page allocator rather than entries and since the host kernel does
not control what pages are in use, there is no point in pool locks and
exchange()+put_page(oldtce) is sufficient to avoid possible races.

This adds an exchange() callback to iommu_table_ops which does the same
thing as set() plus it returns replaced TCE and DMA direction so
the caller can release the pages afterwards. The exchange() receives
a physical address unlike set() which receives linear mapping address;
and returns a physical address as the clear() does.

This implements exchange() for P5IOC2/IODA/IODA2. This adds a requirement
for a platform to have exchange() implemented in order to support VFIO.

This replaces iommu_tce_build() and iommu_clear_tce() with
a single iommu_tce_xchg().

This makes sure that TCE permission bits are not set in TCE passed to
IOMMU API as those are to be calculated by platform code from
DMA direction.

This moves SetPageDirty() to the IOMMU code to make it work for both
VFIO ioctl interface in in-kernel TCE acceleration (when it becomes
available later).
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
[aw: for the vfio related changes]
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

05c6cfb9

powerpc/powernv: Implement accessor to TCE entry · c5bb44ed

由 Alexey Kardashevskiy 提交于 6月 05, 2015

This replaces direct accesses to TCE table with a helper which
returns an TCE entry address. This does not make difference now but will
when multi-level TCE tables get introduces.

No change in behavior is expected.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c5bb44ed

powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group · 0eaf4def

由 Alexey Kardashevskiy 提交于 6月 05, 2015

So far one TCE table could only be used by one IOMMU group. However
IODA2 hardware allows programming the same TCE table address to
multiple PE allowing sharing tables.

This replaces a single pointer to a group in a iommu_table struct
with a linked list of groups which provides the way of invalidating
TCE cache for every PE when an actual TCE table is updated. This adds
pnv_pci_link_table_and_group() and pnv_pci_unlink_table_and_group()
helpers to manage the list. However without VFIO, it is still going
to be a single IOMMU group per iommu_table.

This changes iommu_add_device() to add a device to a first group
from the group list of a table as it is only called from the platform
init code or PCI bus notifier and at these moments there is only
one group per table.

This does not change TCE invalidation code to loop through all
attached groups in order to simplify this patch and because
it is not really needed in most cases. IODA2 is fixed in a later
patch.

This should cause no behavioural change.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
[aw: for the vfio related changes]
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0eaf4def

powerpc/powernv/ioda/ioda2: Rework TCE invalidation in tce_build()/tce_free() · decbda25

由 Alexey Kardashevskiy 提交于 6月 05, 2015

The pnv_pci_ioda_tce_invalidate() helper invalidates TCE cache. It is
supposed to be called on IODA1/2 and not called on p5ioc2. It receives
start and end host addresses of TCE table.

IODA2 actually needs PCI addresses to invalidate the cache. Those
can be calculated from host addresses but since we are going
to implement multi-level TCE tables, calculating PCI address from
a host address might get either tricky or ugly as TCE table remains flat
on PCI bus but not in RAM.

This moves pnv_pci_ioda_tce_invalidate() from generic pnv_tce_build/
pnt_tce_free and defines IODA1/2-specific callbacks which call generic
ones and do PHB-model-specific TCE cache invalidation. P5IOC2 keeps
using generic callbacks as before.

This changes pnv_pci_ioda2_tce_invalidate() to receives TCE index and
number of pages which are PCI addresses shifted by IOMMU page shift.

No change in behaviour is expected.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

decbda25

powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table · da004c36

由 Alexey Kardashevskiy 提交于 6月 05, 2015

This adds a iommu_table_ops struct and puts pointer to it into
the iommu_table struct. This moves tce_build/tce_free/tce_get/tce_flush
callbacks from ppc_md to the new struct where they really belong to.

This adds the requirement for @it_ops to be initialized before calling
iommu_init_table() to make sure that we do not leave any IOMMU table
with iommu_table_ops uninitialized. This is not a parameter of
iommu_init_table() though as there will be cases when iommu_init_table()
will not be called on TCE tables, for example - VFIO.

This does s/tce_build/set/, s/tce_free/clear/ and removes "tce_"
redundant prefixes.

This removes tce_xxx_rm handlers from ppc_md but does not add
them to iommu_table_ops as this will be done later if we decide to
support TCE hypercalls in real mode. This removes _vm callbacks as
only virtual mode is supported by now so this also removes @rm parameter.

For pSeries, this always uses tce_buildmulti_pSeriesLP/
tce_buildmulti_pSeriesLP. This changes multi callback to fall back to
tce_build_pSeriesLP/tce_free_pSeriesLP if FW_FEATURE_MULTITCE is not
present. The reason for this is we still have to support "multitce=off"
boot parameter in disable_multitce() and we do not want to walk through
all IOMMU tables in the system and replace "multi" callbacks with single
ones.

For powernv, this defines _ops per PHB type which are P5IOC2/IODA1/IODA2.
This makes the callbacks for them public. Later patches will extend
callbacks for IODA1/2.

No change in behaviour is expected.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

da004c36

powerpc/powernv: Do not set "read" flag if direction==DMA_NONE · 10b35b2b

由 Alexey Kardashevskiy 提交于 6月 05, 2015

Normally a bitmap from the iommu_table is used to track what TCE entry
is in use. Since we are going to use iommu_table without its locks and
do xchg() instead, it becomes essential not to put bits which are not
implied in the direction flag as the old TCE value (more precisely -
the permission bits) will be used to decide whether to put the page or not.

This adds iommu_direction_to_tce_perm() (its counterpart is there already)
and uses it for powernv's pnv_tce_build().
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

10b35b2b

03 6月, 2015 1 次提交

powerpc/pci: Add shutdown hook to pci_controller_ops · 7a8e6bbf

由 Michael Neuling 提交于 5月 27, 2015

Currently pnv_pci_shutdown() calls the PHB shutdown code for all PHBs in the
system.  It dereferences the private_data assuming it's a powernv PHB, which
won't be the case when we have different PHB in the systems (like when we add
vPHBs for CXL).

This moves the shutdown hook to the pci_controller_ops and fixes the call site
to use that instead.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7a8e6bbf

02 6月, 2015 2 次提交

powerpc/powernv: Move dma_set_mask() from pnv_phb to pci_controller_ops · 763d2d8d

由 Daniel Axtens 提交于 4月 28, 2015

Previously, dma_set_mask() on powernv was convoluted:
 0) Call dma_set_mask() (a/p/kernel/dma.c)
 1) In dma_set_mask(), ppc_md.dma_set_mask() exists, so call it.
 2) On powernv, that function pointer is pnv_dma_set_mask().
    In pnv_dma_set_mask(), the device is pci, so call pnv_pci_dma_set_mask().
 3) In pnv_pci_dma_set_mask(), call pnv_phb->set_dma_mask() if it exists.
 4) It only exists in the ioda case, where it points to
    pnv_pci_ioda_dma_set_mask(), which is the final function.

So the call chain is:
 dma_set_mask() ->
  pnv_dma_set_mask() ->
   pnv_pci_dma_set_mask() ->
    pnv_pci_ioda_dma_set_mask()

Both ppc_md and pnv_phb function pointers are used.

Rip out the ppc_md call, pnv_dma_set_mask() and pnv_pci_dma_set_mask().

Instead:
 0) Call dma_set_mask() (a/p/kernel/dma.c)
 1) In dma_set_mask(), the device is pci, and pci_controller_ops.dma_set_mask()
    exists, so call pci_controller_ops.dma_set_mask()
 2) In the ioda case, that points to pnv_pci_ioda_dma_set_mask().

The new call chain is
 dma_set_mask() ->
  pnv_pci_ioda_dma_set_mask()

Now only the pci_controller_ops function pointer is used.

The fallback paths for p5ioc2 are the same.

Previously, pnv_pci_dma_set_mask() would find no pnv_phb->set_dma_mask()
function, to it would call __set_dma_mask().

Now, dma_set_mask() finds no ppc_md call or pci_controller_ops call,
so it calls __set_dma_mask().
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

763d2d8d

powerpc/powernv: Specialise pci_controller_ops for each controller type · 92ae0353

由 Daniel Axtens 提交于 4月 28, 2015

Remove powernv generic PCI controller operations. Replace it with
controller ops for each of the two supported PHBs.

As an added bonus, make the two new structs const, which will help
guard against bugs such as the one introduced in 65ebf4b6
("powerpc/powernv: Move controller ops from ppc_md to controller_ops")
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

92ae0353

22 5月, 2015 1 次提交

powerpc/powernv: Move MSI-related ops to pci_controller_ops · d6381119

由 Daniel Axtens 提交于 4月 14, 2015

Move the PowerNV/BML platform to use the pci_controller_ops structure
rather than ppc_md for MSI related PCI controller operations.
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d6381119

11 4月, 2015 1 次提交

powerpc/powernv: Move controller ops from ppc_md to controller_ops · 65ebf4b6

由 Daniel Axtens 提交于 3月 31, 2015

This moves the PowerNV platform to use the pci_controller_ops
structure rather than ppc_md for PCI controller operations.
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

65ebf4b6

07 4月, 2015 1 次提交

powerpc/powernv: Remove powernv RTAS support · 646b54f2

由 Michael Ellerman 提交于 3月 12, 2015

The powernv code has some conditional support for running on bare metal
machines that have no OPAL firmware, but provide RTAS.

No released machines ever supported that, and even in the lab it was
just a transitional hack in the days when OPAL was still being
developed.

So remove the code.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NStewart Smith <stewart@linux.vnet.ibm.com>

646b54f2

31 3月, 2015 1 次提交

powerpc/powernv: Shift VF resource with an offset · 781a868f

由 Wei Yang 提交于 3月 25, 2015

On PowerNV platform, resource position in M64 BAR implies the PE# the
resource belongs to. In some cases, adjustment of a resource is necessary
to locate it to a correct position in M64 BAR .

This patch adds pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR
address according to an offset.

Note:

    After doing so, there would be a "hole" in the /proc/iomem when offset
    is a positive value. It looks like the device return some mmio back to
    the system, which actually no one could use it.

[bhelgaas: rework loops, rework overlap check, index resource[]
conventionally, remove pci_regs.h include, squashed with next patch]
Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

781a868f

24 3月, 2015 1 次提交

powerpc/powernv: Use pci_dn, not device_node, in PCI config accessor · 3532a741

由 Gavin Shan 提交于 3月 17, 2015

The PCI config accessors previously relied on device_node.  Unfortunately,
VFs don't have a corresponding device_node, so change the accessors to use
pci_dn instead.

[bhelgaas: changelog]
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

3532a741

04 3月, 2015 1 次提交

powerpc/iommu: Remove IOMMU device references via bus notifier · 4ad04e59

由 Nishanth Aravamudan 提交于 2月 21, 2015

After d905c5df ("PPC: POWERNV: move iommu_add_device earlier"), the
refcnt on the kobject backing the IOMMU group for a PCI device is
elevated by each call to pci_dma_dev_setup_pSeriesLP() (via
set_iommu_table_base_and_group). When we go to dlpar a multi-function
PCI device out:

        iommu_reconfig_notifier ->
                iommu_free_table ->
                        iommu_group_put
                        BUG_ON(tbl->it_group)

We trip this BUG_ON, because there are still references on the table, so
it is not freed. Fix this by moving the powernv bus notifier to common
code and calling it for both powernv and pseries.

Fixes: d905c5df ("PPC: POWERNV: move iommu_add_device earlier")
Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
Tested-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4ad04e59

23 1月, 2015 1 次提交

powerpc/powernv: Remove pnv_pci_probe_mode() · a113de37

由 Gavin Shan 提交于 12月 11, 2014

The callback (ppc_md.pci_probe_mode()) is used to determine if the
child PCI devices of the indicated PCI bus should be probed from
device-tree or hardware. On PowerNV platform, we always expect
probing PCI devices from hardware, which is PowerPC PCI core's
default behaviour. Also, the callback had some delay implemented
based on PHB's device node property "reset-clear-timestamp", which
wasn't exported from skiboot. So we don't need this function and
it's safe to remove it.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a113de37

24 11月, 2014 1 次提交

powerpc/powernv: Honor the generic "no_64bit_msi" flag · 36074381

由 Benjamin Herrenschmidt 提交于 10月 07, 2014

Instead of the arch specific quirk which we are deprecating
and that drivers don't understand.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org>

36074381

23 11月, 2014 1 次提交

PCI/MSI: Rename write_msi_msg() to pci_write_msi_msg() · 83a18912

由 Jiang Liu 提交于 11月 09, 2014

Rename write_msi_msg() to pci_write_msi_msg() to mark it as PCI
specific.
Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
Cc: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

83a18912

10 11月, 2014 1 次提交

powerpc: Remove superfluous bootmem includes · 68cf0d64

由 Anton Blanchard 提交于 9月 17, 2014

Lots of places included bootmem.h even when not using bootmem.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Tested-by: NEmil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

68cf0d64

15 10月, 2014 1 次提交

powerpc/eeh: Rename flag EEH_PE_RESET to EEH_PE_CFG_BLOCKED · 8a6b3710

由 Gavin Shan 提交于 10月 01, 2014

The flag EEH_PE_RESET indicates blocking config space of the PE
during reset time. We potentially need block PE's config space
other than reset time. So it's reasonable to replace it with
EEH_PE_CFG_BLOCKED to indicate its usage.

There are no substantial code or logic changes in this patch.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8a6b3710

02 10月, 2014 1 次提交

PCI/MSI/PPC: Remove arch_msi_check_device() · 6b2fd7ef

由 Alexander Gordeev 提交于 9月 07, 2014

Move MSI checks from arch_msi_check_device() to arch_setup_msi_irqs().
This makes the code more compact and allows removing
arch_msi_check_device() from generic MSI code.
Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NMichael Ellerman <mpe@ellerman.id.au>

6b2fd7ef

30 9月, 2014 1 次提交

powerpc/powernv: Override dma_get_required_mask() · fe7e85c6

由 Gavin Shan 提交于 9月 30, 2014

The dma_get_required_mask() function is used by some drivers to
query the platform about what DMA mask is needed to cover all of
memory. This is a bit of a strange semantic when we have to choose
between IOMMU translation or bypass, but essentially what it means
is "what DMA mask will give best performances".

Currently, our IOMMU backend always returns a 32-bit mask here, we
don't do anything special to it when we have bypass available. This
causes some drivers to choose a 32-bit mask, thus losing the ability
to use the bypass window, thinking this is more efficient. The problem
was reported from the driver of following device:

0004:03:00.0 0107: 1000:0087 (rev 05)
0004:03:00.0 Serial Attached SCSI controller: LSI Logic / Symbios \
             Logic SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)

This patch adds an override of that function in order to, instead,
return a 64-bit mask whenever a bypass window is available in order
for drivers to prefer this configuration.
Reported-by: NMurali N. Iyer <mniyer@us.ibm.com>
Suggested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fe7e85c6

05 8月, 2014 2 次提交

powerpc/powernv: Handle compound PE in config accessors · 98fd7002

由 Gavin Shan 提交于 7月 21, 2014

The PCI config accessors check for PE frozen state and clear it if
EEH isn't functional. The patch handles compound PE in config accessors
if PHB supports it. For consistency, all PEs will be put into frozen
state if any one in compound group gets frozen by hardware.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

98fd7002

powerpc/eeh: Make diag-data not endian dependent · f18440fb

由 Gavin Shan 提交于 7月 17, 2014

It's followup of commit ddf0322a ("powerpc/powernv: Fix endianness
problems in EEH"). The patch helps to get non-endian-dependent
diag-data.

Cc: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

f18440fb

28 7月, 2014 1 次提交

powerpc/powernv: Switch powernv drivers to use machine_xxx_initcall() · b14726c5

由 Michael Ellerman 提交于 7月 15, 2014

A lot of the code in platforms/powernv is using non-machine initcalls.
That means if a kernel built with powernv support runs on another
platform, for example pseries, the initcalls will still run.

That is usually OK, because the initcalls will check for something in
the device tree or elsewhere before doing anything, so on other
platforms they will usually just return.

But it's fishy for powernv code to be running on other platforms, so
switch them all to be machine initcalls. If we want any of them to run
on other platforms in future they should move to sysdev.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

b14726c5

11 7月, 2014 2 次提交

powerpc/powernv: Add a page size parameter to pnv_pci_setup_iommu_table() · 8fa5d454

由 Alexey Kardashevskiy 提交于 6月 06, 2014

Since a TCE page size can be other than 4K, make it configurable for
P5IOC2 and IODA PHBs.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

8fa5d454

powerpc/powernv: Use it_page_shift in TCE build · bc32057e

由 Alexey Kardashevskiy 提交于 6月 06, 2014

This makes use of iommu_table::it_page_shift instead of TCE_SHIFT and
TCE_RPN_SHIFT hardcoded values.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

bc32057e

11 6月, 2014 1 次提交

powerpc/powernv: Fix endianness problems in EEH · ddf0322a

由 Guo Chao 提交于 6月 09, 2014

EEH information fetched from OPAL need fix before using in LE environment.
To be included in sparse's endian check, declare them as __beXX and
access them by accessors.

Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NGuo Chao <yan@linux.vnet.ibm.com>
Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ddf0322a

28 4月, 2014 4 次提交

powerpc/eeh: No hotplug on permanently removed dev · d2b0f6f7

由 Gavin Shan 提交于 4月 24, 2014

The issue was detected in a bit complicated test case where
we have multiple hierarchical PEs shown as following figure:

                +-----------------+
                | PE#3     p2p#0  |
                |          p2p#1  |
                +-----------------+
                        |
                +-----------------+
                | PE#4     pdev#0 |
                |          pdev#1 |
                +-----------------+

PE#4 (have 2 PCI devices) is the child of PE#3, which has 2 p2p
bridges. We accidentally had less-known scenario: PE#4 was removed
permanently from the system because of permanent failure (e.g.
exceeding the max allowd failure times in last hour), then we detects
EEH errors on PE#3 and tried to recover it. However, eeh_dev instances
for pdev#0/1 were not detached from PE#4, which was still connected to
PE#3. All of that was because of the fact that we rely on count-based
pcibios_release_device(), which isn't reliable enough. When doing
recovery for PE#3, we still apply hotplug on PE#4 and pdev#0/1, which
are not valid any more. Eventually, we run into kernel crash.

The patch fixes above issue from two aspects. For unplug, we simply
skip those permanently removed PE, whose state is (EEH_PE_STATE_ISOLATED
&& !EEH_PE_STATE_RECOVERING) and its frozen count should be greater
than EEH_MAX_ALLOWED_FREEZES. For plug, we marked all permanently
removed EEH devices with EEH_DEV_REMOVED and return 0xFF's on read
its PCI config so that PCI core will omit them.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d2b0f6f7

powerpc/eeh: Block PCI-CFG access during PE reset · d0914f50

由 Gavin Shan 提交于 4月 24, 2014

We've observed multiple PE reset failures because of PCI-CFG
access during that period. Potentially, some device drivers
can't support EEH very well and they can't put the device to
motionless state before PE reset. So those device drivers might
produce PCI-CFG accesses during PE reset. Also, we could have
PCI-CFG access from user space (e.g. "lspci"). Since access to
frozen PE should return 0xFF's, we can block PCI-CFG access
during the period of PE reset so that we won't get recrusive EEH
errors.

The patch adds flag EEH_PE_RESET, which is kept during PE reset.
The PowerNV/pSeries PCI-CFG accessors reuse the flag to block
PCI-CFG accordingly.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d0914f50

powerpc/powernv: Remove fields in PHB diag-data dump · b34497d1

由 Gavin Shan 提交于 4月 24, 2014

For some fields (e.g. LEM, MMIO, DMA) in PHB diag-data dump, it's
meaningless to print them if they have non-zero value in the
corresponding mask registers because we always have non-zero values
in the mask registers. The patch only prints those fieds if we
have non-zero values in the primary registers (e.g. LEM, MMIO, DMA
status) so that we can save couple of lines. The patch also removes
unnecessary spare line before "brdgCtl:" and two leading spaces as
prefix in each line as Ben suggested.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

b34497d1

powerpc/powernv: Move PNV_EEH_STATE_ENABLED around · f5bc6b70

由 Gavin Shan 提交于 4月 24, 2014

The flag PNV_EEH_STATE_ENABLED is put into pnv_phb::eeh_state,
which is protected by CONFIG_EEH. We needn't that. Instead, we
can have pnv_phb::flags and maintain all flags there, which is
the purpose of the patch. The patch also renames PNV_EEH_STATE_ENABLED
to PNV_PHB_FLAG_EEH.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

f5bc6b70

28 2月, 2014 1 次提交

powerpc/powernv: Refactor PHB diag-data dump · af87d2fe

由 Gavin Shan 提交于 2月 25, 2014

As Ben suggested, the patch prints PHB diag-data with multiple
fields in one line and omits the line if the fields of that
line are all zero.

With the patch applied, the PHB3 diag-data dump looks like:

PHB3 PHB#3 Diag-data (Version: 1)

  brdgCtl:     00000002
  RootSts:     0000000f 00400000 b0830008 00100147 00002000
  nFir:        0000000000000000 0030006e00000000 0000000000000000
  PhbSts:      0000001c00000000 0000000000000000
  Lem:         0000000000100000 42498e327f502eae 0000000000000000
  InAErr:      8000000000000000 8000000000000000 0402030000000000 0000000000000000
  PE[  8] A/B: 8480002b00000000 8000000000000000

[ The current diag data is so big that it overflows the printk
  buffer pretty quickly in cases when we get a handful of errors
  at once which can happen. --BenH
]
Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
CC: <stable@vger.kernel.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

af87d2fe

11 2月, 2014 1 次提交

powerpc/powernv: Add iommu DMA bypass support for IODA2 · cd15b048

由 Benjamin Herrenschmidt 提交于 2月 11, 2014

This patch adds the support for to create a direct iommu "bypass"
window on IODA2 bridges (such as Power8) allowing to bypass iommu
page translation completely for 64-bit DMA capable devices, thus
significantly improving DMA performances.

Additionally, this adds a hook to the struct iommu_table so that
the IOMMU API / VFIO can disable the bypass when external ownership
is requested, since in that case, the device will be used by an
environment such as userspace or a KVM guest which must not be
allowed to bypass translations.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

cd15b048

30 12月, 2013 3 次提交

powerpc/iommu: Update the generic code to use dynamic iommu page sizes · d0847757

由 Alistair Popple 提交于 12月 09, 2013

This patch updates the generic iommu backend code to use the
it_page_shift field to determine the iommu page size instead of
using hardcoded values.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d0847757

powerpc/iommu: Add it_page_shift field to determine iommu page size · 3a553170

由 Alistair Popple 提交于 12月 09, 2013

This patch adds a it_page_shift field to struct iommu_table and
initiliases it to 4K for all platforms.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

3a553170

powerpc/iommu: Update constant names to reflect their hardcoded page size · e589a440

由 Alistair Popple 提交于 12月 09, 2013

The powerpc iommu uses a hardcoded page size of 4K. This patch changes
the name of the IOMMU_PAGE_* macros to reflect the hardcoded values. A
future patch will use the existing names to support dynamic page
sizes.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

e589a440

05 12月, 2013 2 次提交

powerpc/powernv: Move PHB-diag dump functions around · 93aef2a7

由 Gavin Shan 提交于 11月 22, 2013

Prior to the completion of PCI enumeration, we actively detects
EEH errors on PCI config cycles and dump PHB diag-data if necessary.
The EEH backend also dumps PHB diag-data in case of frozen PE or
fenced PHB. However, we are using different functions to dump the
PHB diag-data for those 2 cases.

The patch merges the functions for dumping PHB diag-data to one so
that we can avoid duplicate code. Also, we never dump PHB3 diag-data
during PCI config cycles with frozen PE. The patch fixes it as well.
Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

93aef2a7

PPC: POWERNV: move iommu_add_device earlier · d905c5df

由 Alexey Kardashevskiy 提交于 11月 21, 2013

The current implementation of IOMMU on sPAPR does not use iommu_ops
and therefore does not call IOMMU API's bus_set_iommu() which
1) sets iommu_ops for a bus
2) registers a bus notifier
Instead, PCI devices are added to IOMMU groups from
subsys_initcall_sync(tce_iommu_init) which does basically the same
thing without using iommu_ops callbacks.

However Freescale PAMU driver (https://lkml.org/lkml/2013/7/1/158)
implements iommu_ops and when tce_iommu_init is called, every PCI device
is already added to some group so there is a conflict.

This patch does 2 things:
1. removes the loop in which PCI devices were added to groups and
adds explicit iommu_add_device() calls to add devices as soon as they get
the iommu_table pointer assigned to them.
2. moves a bus notifier to powernv code in order to avoid conflict with
the notifier from Freescale driver.

iommu_add_device() and iommu_del_device() are public now.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d905c5df

06 11月, 2013 1 次提交

powerpc/powernv: Reserve the correct PE number · 36954dc7

由 Gavin Shan 提交于 11月 04, 2013

We're assigning PE numbers after the completion of PCI probe. During
the PCI probe, we had PE#0 as the super container to encompass all
PCI devices. However, that's inappropriate since PELTM has ascending
order of priority on search on P7IOC. So we need PE#127 takes the
role that PE#0 has previously. For PHB3, we still have PE#0 as the
reserved PE.

The patch supposes that the underly firmware has built the RID to
PE# mapping after resetting IODA tables: all PELTM entries except
last one has invalid mapping on P7IOC, but all RTEs have binding
to PE#0. The reserved PE# is being exported by firmware by device
tree.
Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

36954dc7

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功