提交 · 82eae1afbbdcaf2d716f88025736dc2d6f7afbf0 · openeuler / Kernel

28 4月, 2017 1 次提交

powerpc/powernv: Check kzalloc() return value in pnv_pci_table_alloc · 82eae1af

由 Alexey Kardashevskiy 提交于 3月 27, 2017

pnv_pci_table_alloc() ignores possible failure from kzalloc_node(),
this adds a check. There are 2 callers of pnv_pci_table_alloc(),
one already checks for tbl!=NULL, this adds WARN_ON() to the other path
which only happens during boot time in IODA1 and not expected to fail.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

82eae1af

11 4月, 2017 1 次提交

powerpc: Create asm/debugfs.h and move powerpc_debugfs_root there · 7644d581

由 Michael Ellerman 提交于 2月 10, 2017

powerpc_debugfs_root is the dentry representing the root of the
"powerpc" directory tree in debugfs.

Currently it sits in asm/debug.h, a long with some other things that
have "debug" in the name, but are otherwise unrelated.

Pull it out into a separate header, which also includes linux/debugfs.h,
and convert all the users to include debugfs.h instead of debug.h.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7644d581

04 4月, 2017 1 次提交

powerpc/powernv: Introduce address translation services for Nvlink2 · 1ab66d1f

由 Alistair Popple 提交于 4月 03, 2017

Nvlink2 supports address translation services (ATS) allowing devices
to request address translations from an mmu known as the nest MMU
which is setup to walk the CPU page tables.

To access this functionality certain firmware calls are required to
setup and manage hardware context tables in the nvlink processing unit
(NPU). The NPU also manages forwarding of TLB invalidates (known as
address translation shootdowns/ATSDs) to attached devices.

This patch exports several methods to allow device drivers to register
a process id (PASID/PID) in the hardware tables and to receive
notification of when a device should stop issuing address translation
requests (ATRs). It also adds a fault handler to allow device drivers
to demand fault pages in.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
[mpe: Fix up comment formatting, use flush_tlb_mm()]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1ab66d1f

30 3月, 2017 3 次提交

powerpc/vfio_spapr_tce: Add reference counting to iommu_table · e5afdf9d

由 Alexey Kardashevskiy 提交于 3月 22, 2017

So far iommu_table obejcts were only used in virtual mode and had
a single owner. We are going to change this by implementing in-kernel
acceleration of DMA mapping requests. The proposed acceleration
will handle requests in real mode and KVM will keep references to tables.

This adds a kref to iommu_table and defines new helpers to update it.
This replaces iommu_free_table() with iommu_tce_table_put() and makes
iommu_free_table() static. iommu_tce_table_get() is not used in this patch
but it will be in the following patch.

Since this touches prototypes, this also removes @node_name parameter as
it has never been really useful on powernv and carrying it for
the pseries platform code to iommu_free_table() seems to be quite
useless as well.

This should cause no behavioral change.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e5afdf9d

powerpc/iommu/vfio_spapr_tce: Cleanup iommu_table disposal · 11edf116

由 Alexey Kardashevskiy 提交于 3月 22, 2017

At the moment iommu_table can be disposed by either calling
iommu_table_free() directly or it_ops::free(); the only implementation
of free() is in IODA2 - pnv_ioda2_table_free() - and it calls
iommu_table_free() anyway.

As we are going to have reference counting on tables, we need an unified
way of disposing tables.

This moves it_ops::free() call into iommu_free_table() and makes use
of the latter. The free() callback now handles only platform-specific
data.

As from now on the iommu_free_table() calls it_ops->free(), we need
to have it_ops initialized before calling iommu_free_table() so this
moves this initialization in pnv_pci_ioda2_create_table().

This should cause no behavioral change.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

11edf116

powerpc/powernv/iommu: Add real mode version of iommu_table_ops::exchange() · a540aa56

由 Alexey Kardashevskiy 提交于 3月 22, 2017

In real mode, TCE tables are invalidated using special
cache-inhibited store instructions which are not available in
virtual mode

This defines and implements exchange_rm() callback. This does not
define set_rm/clear_rm/flush_rm callbacks as there is no user for those -
exchange/exchange_rm are only to be used by KVM for VFIO.

The exchange_rm callback is defined for IODA1/IODA2 powernv platforms.

This replaces list_for_each_entry_rcu with its lockless version as
from now on pnv_pci_ioda2_tce_invalidate() can be called in
the real mode too.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a540aa56

20 3月, 2017 1 次提交

powerpc/powernv/npu: Remove dead iommu code · 20f13b95

由 Alexey Kardashevskiy 提交于 2月 21, 2017

PNV_IODA_PE_DEV is only used for NPU devices (emulated PCI bridges
representing NVLink). These are added to IOMMU groups with corresponding
NVIDIA devices after all non-NPU PEs are setup; a special helper -
pnv_pci_ioda_setup_iommu_api() - handles this in pnv_pci_ioda_fixup().

The pnv_pci_ioda2_setup_dma_pe() helper sets up DMA for a PE. It is called
for VFs (so it does not handle NPU case) and PCI bridges but only
IODA1 and IODA2 types. An NPU bridge has its own type id (PNV_PHB_NPU)
so pnv_pci_ioda2_setup_dma_pe() cannot be called on NPU and therefore
(pe->flags & PNV_IODA_PE_DEV) is always "false".

This removes not used iommu_add_device(). This should not cause any
behavioral change.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

20f13b95

09 3月, 2017 2 次提交

powerpc/powernv/ioda2: Update iommu table base on ownership change · db08e1d5

由 Alexey Kardashevskiy 提交于 2月 21, 2017

On POWERNV platform, in order to do DMA via IOMMU (i.e. 32bit DMA in
our case), a device needs an iommu_table pointer set via
set_iommu_table_base().

The codeflow is:
- pnv_pci_ioda2_setup_dma_pe()
	- pnv_pci_ioda2_setup_default_config()
	- pnv_ioda_setup_bus_dma() [1]

pnv_pci_ioda2_setup_dma_pe() creates IOMMU groups,
pnv_pci_ioda2_setup_default_config() does default DMA setup,
pnv_ioda_setup_bus_dma() takes a bus PE (on IODA2, all physical function
PEs as bus PEs except NPU), walks through all underlying buses and
devices, adds all devices to an IOMMU group and sets iommu_table.

On IODA2, when VFIO is used, it takes ownership over a PE which means it
removes all tables and creates new ones (with a possibility of sharing
them among PEs). So when the ownership is returned from VFIO to
the kernel, the iommu_table pointer written to a device at [1] is
stale and needs an update.

This adds an "add_to_group" parameter to pnv_ioda_setup_bus_dma()
(in fact re-adds as it used to be there a while ago for different
reasons) to tell the helper if a device needs to be added to
an IOMMU group with an iommu_table update or just the latter.

This calls pnv_ioda_setup_bus_dma(..., false) from
pnv_ioda2_release_ownership() so when the ownership is restored,
32bit DMA can work again for a device. This does the same thing
on obtaining ownership as the iommu_table point is stale at this point
anyway and it is safer to have NULL there.

We did not hit this earlier as all tested devices in recent years were
only using 64bit DMA; the rare exception for this is MPT3 SAS adapter
which uses both 32bit and 64bit DMA access and it has not been tested
with VFIO much.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

db08e1d5

powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested · 7aafac11

由 Alexey Kardashevskiy 提交于 2月 22, 2017

The IODA2 specification says that a 64 DMA address cannot use top 4 bits
(3 are reserved and one is a "TVE select"); bottom page_shift bits
cannot be used for multilevel table addressing either.

The existing IODA2 table allocation code aligns the minimum TCE table
size to PAGE_SIZE so in the case of 64K system pages and 4K IOMMU pages,
we have 64-4-12=48 bits. Since 64K page stores 8192 TCEs, i.e. needs
13 bits, the maximum number of levels is 48/13 = 3 so we physically
cannot address more and EEH happens on DMA accesses.

This adds a check that too many levels were requested.

It is still possible to have 5 levels in the case of 4K system page size.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7aafac11

28 2月, 2017 1 次提交

scripts/spelling.txt: add "overrided" pattern and fix typo instances · 03671057

由 Masahiro Yamada 提交于 2月 27, 2017

Fix typos and add the following to the scripts/spelling.txt:

overrided||overridden

Link: http://lkml.kernel.org/r/1481573103-11329-22-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

03671057

17 2月, 2017 1 次提交

powerpc/powernv: Remove unused variable in pnv_pci_sriov_disable() · 02983449

由 Gavin Shan 提交于 1月 11, 2017

The local variable @iov isn't used, to remove it.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

02983449

07 2月, 2017 1 次提交

powerpc/powernv: Remove separate entry for OPAL real mode calls · ab9bad0e

由 Benjamin Herrenschmidt 提交于 2月 07, 2017

All entry points already read the MSR so they can easily do
the right thing.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ab9bad0e

30 1月, 2017 1 次提交

powerpc/powernv: Use OPAL call for TCE kill on NVLink2 · 616badd2

由 Alistair Popple 提交于 1月 10, 2017

Add detection of NPU2 PHBs. NPU2/NVLink2 has a different register
layout for the TCE kill register therefore TCE invalidation should be
done via the OPAL call rather than using the register directly as it
is for PHB3 and NVLink1. This changes TCE invalidation to use the OPAL
call in the case of a NPU2 PHB model.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

616badd2

25 1月, 2017 1 次提交

powerpc/powernv/pci: Use kmalloc_array() in two functions · fb37e128

由 Markus Elfring 提交于 8月 24, 2016

Use kmalloc_array(), which checks for overflow of the multiplication,
rather than doing it by hand.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fb37e128

22 11月, 2016 2 次提交

powerpc/pci: Always print PHB and PE numbers as hexadecimal · 1f52f176

由 Russell Currey 提交于 11月 16, 2016

PHB, PE (and by association MVE) numbers are printed as a mix of decimal
and hexadecimal throughout the kernel.  This can be misleading, so make
them all hexadecimal.

Standardising on hex instead of dec because:

 - PHB numbers are presented in hex in sysfs/debugfs (and lspci, etc)
 - PE numbers are presented as hex in sysfs and parsed in hex in debugfs

The only place I think this could cause confusing are the messages during
boot, i.e.

	pci 000a:01     : [PE# 000] Secondary bus 1 associated with PE#0

which can be a quick way to check PE numbers.  pe_level_printk() will
only print two characters instead of three, so the above would be

	pci 000a:01     : [PE# 00] Secondary bus 1 associated with PE#0

which gives a hint it's in hex.
Signed-off-by: NRussell Currey <ruscur@russell.cc>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1f52f176

powerpc/powernv: Don't warn on PE init if unfreeze is unsupported · d4791db5

由 Russell Currey 提交于 11月 16, 2016

Whenever a PE is initialised in powernv, opal_pci_eeh_freeze_clear() is
called.  This is to remove any existing freeze, and has no negative side
effects if the PE is already in an unfrozen state.  On PHB backends that
don't support this operation and return OPAL_UNSUPPORTED, this creates a
scary and misleading warning message.

Skip the warning message on init if OPAL_UNSUPPORTED is returned.

As far as I'm aware, this currently only affects NPUs.

Fixes: 313483dd ("powerpc/powernv: Unfreeze PE on allocation")
Signed-off-by: NRussell Currey <ruscur@russell.cc>
Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d4791db5

04 10月, 2016 1 次提交

powerpc/powernv: Fix data type for @r in pnv_ioda_parse_m64_window() · 0e7736c6

由 Gavin Shan 提交于 8月 02, 2016

This fixes warning reported from sparse:

  pci-ioda.c:451:49: warning: incorrect type in argument 2 (different base types)

Fixes: 262af557 ("powerpc/powernv: Enable M64 aperatus for PHB3")
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0e7736c6

29 9月, 2016 1 次提交

powerpc/powernv: Unfreeze PE on allocation · 313483dd

由 Gavin Shan 提交于 9月 28, 2016

This unfreezes PE when it's initialized because the PE might be put
into frozen state in the last hot remove path. It's not harmful to
do so if the PE is already in unfrozen state.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

313483dd

23 9月, 2016 2 次提交

powerpc/powernv: Fix comment style and spelling · 6060e9ea

由 Andrew Donnellan 提交于 9月 16, 2016

Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6060e9ea

powerpc/powernv/pci: Add PHB register dump debugfs handle · 98b665da

由 Russell Currey 提交于 7月 28, 2016

On EEH events the kernel will print a dump of relevant registers.
If EEH is unavailable (i.e. CONFIG_EEH is disabled, a new platform
doesn't have EEH support, etc) this information isn't readily available.

Add a new debugfs handler to trigger a PHB register dump, so that this
information can be made available on demand.
Signed-off-by: NRussell Currey <ruscur@russell.cc>
Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

98b665da

21 9月, 2016 1 次提交

powerpc/powernv/pci: Fix m64 checks for SR-IOV and window alignment · b79331a5

由 Russell Currey 提交于 9月 14, 2016

Commit 5958d19a checks for prefetchable m64 BARs by comparing the
addresses instead of using resource flags.  This broke SR-IOV as the m64
check in pnv_pci_ioda_fixup_iov_resources() fails.

The condition in pnv_pci_window_alignment() also changed to checking
only IORESOURCE_MEM_64 instead of both IORESOURCE_MEM_64 and
IORESOURCE_PREFETCH.

Revert these cases to the previous behaviour, adding a new helper function
to do so.  This is named pnv_pci_is_m64_flags() to make it clear this
function is only looking at resource flags and should not be relied on for
non-SRIOV resources.

Fixes: 5958d19a ("Fix incorrect PE reservation attempt on some 64-bit BARs")
Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NRussell Currey <ruscur@russell.cc>
Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b79331a5

15 9月, 2016 2 次提交

powerpc/powernv/pci: Fix missed TCE invalidations that should fallback to OPAL · ed7d9a1d

由 Michael Ellerman 提交于 9月 15, 2016

In commit f0228c41 ("powerpc/powernv/pci: Fallback to OPAL for TCE
invalidations"), we added logic to fallback to OPAL for doing TCE
invalidations if we can't do it in Linux.

Ben sent a v2 of the patch, containing these additional call sites, but
I had already applied v1 and didn't notice. So fix them now.

Fixes: f0228c41 ("powerpc/powernv/pci: Fallback to OPAL for TCE invalidations")
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ed7d9a1d

powerpc/powernv: Detach from PE on releasing PCI device · 29bf282d

由 Gavin Shan 提交于 9月 06, 2016

The PCI hotplug can be part of EEH error recovery. The @pdn and
the device's PE number aren't removed and added afterwords. The
PE number in @pdn should be set to an invalid one. Otherwise, the
PE's device count is decreased on removing devices while failing
to be increased on adding devices. It leads to unbalanced PE's
device count and make normal PCI hotplug path broken.

Fixes: c5f7700b ("powerpc/powernv: Dynamically release PE")
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

29bf282d

14 9月, 2016 1 次提交

powerpc/powernv: Fix the state of root PE · 6eaed166

由 Gavin Shan 提交于 9月 13, 2016

The PE for root bus (root PE) can be removed because of PCI hot
remove in EEH recovery path for fenced PHB error. We need update
@phb->root_pe_populated accordingly so that the root PE can be
populated again in forthcoming PCI hot add path. Also, the PE
shouldn't be destroyed as it's global and reserved resource.

Fixes: c5f7700b ("powerpc/powernv: Dynamically release PE")
Reported-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6eaed166

09 9月, 2016 1 次提交

powerpc/powernv: Provide facilities for EOI, usable from real mode · 4ee11c1a

由 Suresh Warrier 提交于 8月 19, 2016

This adds a new function pnv_opal_pci_msi_eoi() which does the part of
end-of-interrupt (EOI) handling of an MSI which involves doing an
OPAL call.  This function can be called in real mode.  This doesn't
just export pnv_ioda2_msi_eoi() because that does a call to
icp_native_eoi(), which does not work in real mode.

This also adds a function, is_pnv_opal_msi(), which KVM can call to
check whether an interrupt is one for which we should be calling
pnv_opal_pci_msi_eoi() when we need to do an EOI.

[paulus@ozlabs.org - split out the addition of pnv_opal_pci_msi_eoi()
 from Suresh's patch "KVM: PPC: Book3S HV: Handle passthrough
 interrupts in guest"; added is_pnv_opal_msi(); wrote description.]
Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

4ee11c1a

08 9月, 2016 1 次提交

powerpc/powernv: Fix corrupted PE allocation bitmap on releasing PE · caa58f80

由 Gavin Shan 提交于 9月 06, 2016

In pnv_ioda_free_pe(), the PE object (including the associated PE
number) is cleared before resetting the corresponding bit in the
PE allocation bitmap. It means PE#0 is always released to the bitmap
wrongly.

This fixes above issue by caching the PE number before the PE object
is cleared.

Fixes: 1e916772 ("powerpc/powernv: Use PE instead of number during setup and release"
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

caa58f80

06 9月, 2016 1 次提交

powerpc/powernv: Fix crash on releasing compound PE · b314427a

由 Gavin Shan 提交于 9月 06, 2016

The compound PE is created to accommodate the devices attached to
one specific PCI bus that consume multiple M64 segments. The compound
PE is made up of one master PE and possibly multiple slave PEs. The
slave PEs should be destroyed when releasing the master PE. A kernel
crash happens when derferencing @pe->pdev on releasing the slave PE
in pnv_ioda_deconfigure_pe().

  # echo 0 > /sys/bus/pci/slots/C7/power
  iommu: Removing device 0000:01:00.1 from group 0
  iommu: Removing device 0000:01:00.0 from group 0
  Unable to handle kernel paging request for data at address 0x00000010
  Faulting instruction address: 0xc00000000005d898
  cpu 0x1: Vector: 300 (Data Access) at [c000000fe8217620]
      pc: c00000000005d898: pnv_ioda_release_pe+0x288/0x610
      lr: c00000000005dbdc: pnv_ioda_release_pe+0x5cc/0x610
      sp: c000000fe82178a0
     msr: 9000000000009033
     dar: 10
   dsisr: 40000000
    current = 0xc000000fe815ab80
    paca    = 0xc00000000ff00400	 softe: 0	 irq_happened: 0x01
      pid   = 2709, comm = sh
  Linux version 4.8.0-rc5-gavin-00006-g745efdb (gwshan@gwshan) \
  (gcc version 4.9.3 (Buildroot 2016.02-rc2-00093-g5ea3bce) ) #586 SMP \
  Tue Sep 6 13:37:29 AEST 2016
  enter ? for help
  [c000000fe8217940] c00000000005d684 pnv_ioda_release_pe+0x74/0x610
  [c000000fe82179e0] c000000000034460 pcibios_release_device+0x50/0x70
  [c000000fe8217a10] c0000000004aba80 pci_release_dev+0x50/0xa0
  [c000000fe8217a40] c000000000704898 device_release+0x58/0xf0
  [c000000fe8217ac0] c000000000470510 kobject_release+0x80/0xf0
  [c000000fe8217b00] c000000000704dd4 put_device+0x24/0x40
  [c000000fe8217b20] c0000000004af94c pci_remove_bus_device+0x12c/0x150
  [c000000fe8217b60] c000000000034244 pci_hp_remove_devices+0x94/0xd0
  [c000000fe8217ba0] c0000000004ca444 pnv_php_disable_slot+0x64/0xb0
  [c000000fe8217bd0] c0000000004c88c0 power_write_file+0xa0/0x190
  [c000000fe8217c50] c0000000004c248c pci_slot_attr_store+0x3c/0x60
  [c000000fe8217c70] c0000000002d6494 sysfs_kf_write+0x94/0xc0
  [c000000fe8217cb0] c0000000002d50f0 kernfs_fop_write+0x180/0x260
  [c000000fe8217d00] c0000000002334a0 __vfs_write+0x40/0x190
  [c000000fe8217d90] c000000000234738 vfs_write+0xc8/0x240
  [c000000fe8217de0] c000000000236250 SyS_write+0x60/0x110
  [c000000fe8217e30] c000000000009524 system_call+0x38/0x108

It fixes the kernel crash by bypassing releasing resources (DMA,
IO and memory segments, PELTM) because there are no resources assigned
to the slave PE.

Fixes: c5f7700b ("powerpc/powernv: Dynamically release PE")
Reported-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b314427a

22 8月, 2016 1 次提交

powerpc/powernv/pci: fix iterator signedness · 60964816

由 Andrzej Hajda 提交于 8月 17, 2016

Unsigned type is always non-negative, so the loop could not end in case
condition is never true.

The problem has been detected using semantic patch
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci
Signed-off-by: NAndrzej Hajda <a.hajda@samsung.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

60964816

09 8月, 2016 2 次提交

powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs · 5958d19a

由 Benjamin Herrenschmidt 提交于 7月 08, 2016

The generic allocation code may sometimes decide to assign a prefetchable
64-bit BAR to the M32 window. In fact it may also decide to allocate
a 64-bit non-prefetchable BAR to the M64 one ! So using the resource
flags as a test to decide which window was used for PE allocation is
just wrong and leads to insane PE numbers.

Instead, compare the addresses to figure it out.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Rename the function as agreed by Ben & Gavin]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5958d19a

powerpc/powernv/ioda: Fix TCE invalidate to work in real mode again · 4d902195

由 Alexey Kardashevskiy 提交于 8月 03, 2016

Commit fd141d1a ("powerpc/powernv/pci: Rework accessing the TCE
invalidate register") broke TCE invalidation on IODA2/PHB3 for real
mode.

This makes invalidate work again.

Fixes: fd141d1a ("powerpc/powernv/pci: Rework accessing the TCE invalidate register")
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4d902195

04 8月, 2016 1 次提交

dma-mapping: use unsigned long for dma_attrs · 00085f1e

由 Krzysztof Kozlowski 提交于 8月 03, 2016

The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer.  Thus the pointer can point to const data.
However the attributes do not have to be a bitfield.  Instead unsigned
long will do fine:

1. This is just simpler.  Both in terms of reading the code and setting
   attributes.  Instead of initializing local attributes on the stack
   and passing pointer to it to dma_set_attr(), just set the bits.

2. It brings safeness and checking for const correctness because the
   attributes are passed by value.

Semantic patches for this change (at least most of them):

    virtual patch
    virtual context

    @r@
    identifier f, attrs;

    @@
    f(...,
    - struct dma_attrs *attrs
    + unsigned long attrs
    , ...)
    {
    ...
    }

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

and

    // Options: --all-includes
    virtual patch
    virtual context

    @r@
    identifier f, attrs;
    type t;

    @@
    t f(..., struct dma_attrs *attrs);

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.comSigned-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: NVineet Gupta <vgupta@synopsys.com>
Acked-by: NRobin Murphy <robin.murphy@arm.com>
Acked-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00085f1e

17 7月, 2016 7 次提交

powerpc/powernv/pci: Check status of a PHB before using it · 08a45b32

由 Benjamin Herrenschmidt 提交于 7月 08, 2016

If the firmware encounters an error (internal or HW) during initialization
of a PHB, it might leave the device-node in the tree but mark it disabled
using the "status" property. We should check it.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

08a45b32

powerpc/powernv/pci: Use the device-tree to get available range of M64's · a1339faf

由 Benjamin Herrenschmidt 提交于 7月 08, 2016

M64's are the configurable 64-bit windows that cover the 64-bit MMIO
space. We used to hard code 16 windows. Newer chips might have a
variable number and might need to reserve some as well (for example
on PHB4/POWER9, M32 and M64 are actually unified and we use M64#0
to map the 32-bit space).

So newer OPALs will provide a property we can use to know what range
of windows is available. The property is named so that it can
eventually support multiple ranges but we only use the first one for
now.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a1339faf

powerpc/powernv/pci: Fallback to OPAL for TCE invalidations · f0228c41

由 Benjamin Herrenschmidt 提交于 7月 08, 2016

If we don't find registers for the PHB or don't know the model
specific invalidation method, use OPAL calls instead.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f0228c41

powerpc/powernv/pci: Rework accessing the TCE invalidate register · fd141d1a

由 Benjamin Herrenschmidt 提交于 7月 08, 2016

It's architected, always in a known place, so there is no need
to keep a separate pointer to it, we use the existing "regs",
and we complement it with a real mode variant.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

# Conflicts:
#	arch/powerpc/platforms/powernv/pci-ioda.c
#	arch/powerpc/platforms/powernv/pci.h
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fd141d1a

powerpc/powernv/pci: Remove SWINV constants and obsolete TCE code · 08acce1c

由 Benjamin Herrenschmidt 提交于 7月 08, 2016

We have some obsolete code in pnv_pci_p7ioc_tce_invalidate()
to handle some internal lab tools that have stopped being
useful a long time ago. Remove that along with the definition
and test for the TCE_PCI_SWINV_* flags whose value is basically
always the same.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

08acce1c

powerpc/powernv/pci: Rename TCE invalidation calls · a34ab7c3

由 Benjamin Herrenschmidt 提交于 7月 08, 2016

The TCE invalidation functions are fairly implementation specific,
and while the IODA specs more/less describe the register, in practice
various implementation workarounds may be required. So name the
functions after the target PHB.

Note today and for the foreseeable future, there's a 1:1 relationship
between an IODA version and a PHB implementation. There exist another
variant of IODA1 (Torrent) but we never supported in with OPAL and
never will.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a34ab7c3

powerpc/powernv: Discover IODA3 PHBs · fb111334

由 Benjamin Herrenschmidt 提交于 7月 08, 2016

We instanciate them as IODA2. We also change the MSI EOI hack
to only kick on PHB3 since it will not be needed on any new
implementation.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fb111334

14 7月, 2016 2 次提交

cxl: Add support for interrupts on the Mellanox CX4 · a2f67d5e

由 Ian Munsie 提交于 7月 14, 2016

The Mellanox CX4 in cxl mode uses a hybrid interrupt model, where
interrupts are routed from the networking hardware to the XSL using the
MSIX table, and from there will be transformed back into an MSIX
interrupt using the cxl style interrupts (i.e. using IVTE entries and
ranges to map a PE and AFU interrupt number to an MSIX address).

We want to hide the implementation details of cxl interrupts as much as
possible. To this end, we use a special version of the MSI setup &
teardown routines in the PHB while in cxl mode to allocate the cxl
interrupts and configure the IVTE entries in the process element.

This function does not configure the MSIX table - the CX4 card uses a
custom format in that table and it would not be appropriate to fill that
out in generic code. The rest of the functionality is similar to the
"Full MSI-X mode" described in the CAIA, and this could be easily
extended to support other adapters that use that mode in the future.

The interrupts will be associated with the default context. If the
maximum number of interrupts per context has been limited (e.g. by the
mlx5 driver), it will automatically allocate additional kernel contexts
to associate extra interrupts as required. These contexts will be
started using the same WED that was used to start the default context.
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a2f67d5e

powerpc/powernv: Add support for the cxl kernel api on the real phb · 4361b034

由 Ian Munsie 提交于 7月 14, 2016

This adds support for the peer model of the cxl kernel api to the
PowerNV PHB, in which physical function 0 represents the cxl function on
the card (an XSL in the case of the CX4), which other physical functions
will use for memory access and interrupt services. It is referred to as
the peer model as these functions are peers of one another, as opposed
to the Virtual PHB model which forms a hierarchy.

This patch exports APIs to enable the peer mode, check if a PCI device
is attached to a PHB in this mode, and to set and get the peer AFU for
this mode.

The cxl driver will enable this mode for supported cards by calling
pnv_cxl_enable_phb_kernel_api(). This will set a flag in the PHB to note
that this mode is enabled, and switch out it's controller_ops for the
cxl version.

The cxl version of the controller_ops struct implements it's own
versions of the enable_device_hook and release_device to handle
refcounting on the peer AFU and to allocate a default context for the
device.

Once enabled, the cxl kernel API may not be disabled on a PHB. Currently
there is no safe way to disable cxl mode short of a reboot, so until
that changes there is no reason to support the disable path.
Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4361b034

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功