- 16 7月, 2018 3 次提交
-
-
由 Alexey Kardashevskiy 提交于
We want to support sparse memory and therefore huge chunks of DMA windows do not need to be mapped. If a DMA window big enough to require 2 or more indirect levels, and a DMA window is used to map all RAM (which is a default case for 64bit window), we can actually save some memory by not allocation TCE for regions which we are not going to map anyway. The hardware tables alreary support indirect levels but we also keep host-physical-to-userspace translation array which is allocated by vmalloc() and is a flat array which might use quite some memory. This converts it_userspace from vmalloc'ed array to a multi level table. As the format becomes platform dependend, this replaces the direct access to it_usespace with a iommu_table_ops::useraddrptr hook which returns a pointer to the userspace copy of a TCE; future extension will return NULL if the level was not allocated. This should not change non-KVM handling of TCE tables and it_userspace will not be allocated for non-KVM tables. Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
Right now we have allocation code in pci-ioda.c and traversing code in pci.c, let's keep them toghether. However both files are big enough already so let's move this business to a new file. While we at it, move the code which links IOMMU table groups to IOMMU tables as it is not specific to any PNV PHB model. These puts exported symbols from the new file together. This fixes several warnings from checkpatch.pl like this: "WARNING: Prefer 'unsigned int' to bare use of 'unsigned'". As this is almost cut-n-paste, there should be no behavioral change. Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
This gets rid of a useless wrapper around pnv_pci_ioda2_table_free_pages(). Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 03 6月, 2018 1 次提交
-
-
由 Alexey Kardashevskiy 提交于
When IODA2 creates a PE, it creates an IOMMU table with it_ops::free set to pnv_ioda2_table_free() which calls pnv_pci_ioda2_table_free_pages(). Since iommu_tce_table_put() calls it_ops::free when the last reference to the table is released, explicit call to pnv_pci_ioda2_table_free_pages() is not needed so let's remove it. This should fix double free in the case of PCI hotuplug as pnv_pci_ioda2_table_free_pages() does not reset neither iommu_table::it_base nor ::it_size. This was not exposed by SRIOV as it uses different code path via pnv_pcibios_sriov_disable(). IODA1 does not inialize it_ops::free so it does not have this issue. Fixes: c5f7700b ("powerpc/powernv: Dynamically release PE") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 18 5月, 2018 1 次提交
-
-
由 Michael Ellerman 提交于
This allows us to squash some sparse warnings and also avoids having to do explicity endian conversions in the code. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com>
-
- 14 5月, 2018 1 次提交
-
-
由 Alexey Kardashevskiy 提交于
At the moment we assume that IODA2 and newer PHBs can always do 4K/64K/16M IOMMU pages, however this is not the case for POWER9 and now skiboot advertises the supported sizes via the device so we use that instead of hard coding the mask. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 27 3月, 2018 1 次提交
-
-
由 Alexey Kardashevskiy 提交于
GPUs and the corresponding NVLink bridges get different PEs as they have separate translation validation entries (TVEs). We put these PEs to the same IOMMU group so they cannot be passed through separately. So the iommu_table_group_ops::set_window/unset_window for GPUs do set tables to the NPU PEs as well which means that iommu_table's list of attached PEs (iommu_table_group_link) has both GPU and NPU PEs linked. This list is used for TCE cache invalidation. The problem is that NPU PE has just a single TVE and can be programmed to point to 32bit or 64bit windows while GPU PE has two (as any other PCI device). So we end up having an 32bit iommu_table struct linked to both PEs even though only the 64bit TCE table cache can be invalidated on NPU. And a relatively recent skiboot detects this and prints errors. This changes GPU's iommu_table_group_ops::set_window/unset_window to make sure that NPU PE is only linked to the table actually used by the hardware. If there are two tables used by an IOMMU group, the NPU PE will use the last programmed one which with the current use scenarios is expected to be a 64bit one. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 20 3月, 2018 1 次提交
-
-
由 Markus Elfring 提交于
It's slightly less error prone to use sizeof(*foo) rather than specifying the type. Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net> [mpe: Consolidate into one patch, rewrite change log] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 22 2月, 2018 1 次提交
-
-
由 Ingo Molnar 提交于
On lkml suggestions were made to split up such trivial typo fixes into per subsystem patches: --- a/arch/x86/boot/compressed/eboot.c +++ b/arch/x86/boot/compressed/eboot.c @@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height) struct efi_uga_draw_protocol *uga = NULL, *first_uga; efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID; unsigned long nr_ugas; - u32 *handles = (u32 *)uga_handle;; + u32 *handles = (u32 *)uga_handle; efi_status_t status = EFI_INVALID_PARAMETER; int i; This patch is the result of the following script: $ sed -i 's/;;$/;/g' $(git grep -E ';;$' | grep "\.[ch]:" | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq) ... followed by manual review to make sure it's all good. Splitting this up is just crazy talk, let's get over with this and just do it. Reported-by: NPavel Machek <pavel@ucw.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 1月, 2018 1 次提交
-
-
由 Alexey Kardashevskiy 提交于
The pcidev value stored in pci_dn is only used for NPU/NPU2 initialization. We can easily drop the cached pointer and use an ancient helper - pci_get_domain_bus_and_slot() instead in order to reduce complexity. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Acked-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 24 1月, 2018 2 次提交
-
-
由 Andrew Donnellan 提交于
The configuration space for opencapi devices doesn't have a PCI Express capability, therefore confusing linux in thinking it's of an old PCI type with a 256-byte configuration space size, instead of the desired 4k. So add a PCI fixup to declare the correct size. Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Frederic Barrat 提交于
The NPU was already abstracted by opal as a virtual PHB for nvlink, but it helps to be able to differentiate between a nvlink or opencapi PHB, as it's not completely transparent to linux. In particular, PE assignment differs and we'll also need the information in later patches. So rename existing PNV_PHB_NPU type to PNV_PHB_NPU_NVLINK and add a new type PNV_PHB_NPU_OCAPI. Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 22 1月, 2018 1 次提交
-
-
由 Guilherme G. Piccoli 提交于
During a kdump kernel boot in PowerPC, we request a reset of the PHBs to the FW. It makes sense, since if we are booting a kdump kernel it means we had some trouble before and we cannot rely in the adapters' health; they could be in a bad state, hence the reset is needed. But this reset is useful not only in kdump - there are situations, specially when debugging drivers, that we could break an adapter in a way it requires such reset. One can tell to just go ahead and reboot the machine, but happens that many times doing kexec is much faster, and so preferable than a full power cycle. This patch adds the ppc_pci_reset_phbs parameter to perform such reset. Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 20 1月, 2018 1 次提交
-
-
由 Alexey Kardashevskiy 提交于
9003a249 removed checn from the DMA window pages allocator, however the VFIO driver tests limits before doing so by calling the get_table_size hook which was left behind; this fixes it. Fixes: 9003a249 "powerpc/powernv/ioda: Remove explicit max window size check" Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 10 1月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
We want to use the dma_direct_ namespace for a generic implementation, so rename powerpc to the second best choice: dma_nommu_. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 11 12月, 2017 1 次提交
-
-
由 Bryant G. Ly 提交于
SR-IOV can now be enabled for the powernv platform and pseries platform. Therefore move the appropriate calls to machine dependent code instead of relying on definition at compile time. Signed-off-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: NJuan J. Alvarez <jjalvare@us.ibm.com> Acked-by: NRussell Currey <ruscur@russell.cc> Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 04 12月, 2017 1 次提交
-
-
由 Joe Perches 提交于
At some point, pr_warning will be removed so all logging messages use a consistent <prefix>_warn style. Update arch/powerpc/ Miscellanea: o Coalesce formats o Realign arguments o Use %s, __func__ instead of embedded function names o Remove unnecessary line continuations Signed-off-by: NJoe Perches <joe@perches.com> Acked-by: NGeoff Levand <geoff@infradead.org> [mpe: Rebase due to some %pOF changes.] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 07 11月, 2017 1 次提交
-
-
由 Alexey Kardashevskiy 提交于
DMA windows can only have a size of power of two on IODA2 hardware and using memory_hotplug_max() to determine the upper limit won't work correcly if it returns not power of two value. This removes the check as the platform code does this check in pnv_pci_ioda2_setup_default_config() anyway; the other client is VFIO and that thing checks against locked_vm limit which prevents the userspace from locking too much memory. It is expected to impact DPDK on machines with non-power-of-two RAM size, mostly. KVM guests are less likely to be affected as usually guests get less than half of hosts RAM. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 06 11月, 2017 1 次提交
-
-
由 Alexey Kardashevskiy 提交于
In order to make generic IOV code work, the physical function IOV BAR should start from offset of the first VF. Since M64 segments share PE number space across PHB, and some PEs may be in use at the time when IOV is enabled, the existing code shifts the IOV BAR to the index of the first PE/VF. This creates a hole in IOMEM space which can be potentially taken by some other device. This reserves a temporary hole on a parent and releases it when IOV is disabled; the temporary resources are stored in pci_dn to avoid kmalloc/free. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 26 9月, 2017 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
Remove the post_init callback which is only used by powernv, we can just call it explicitly from the powernv code. This partially kills the ability to "disable" eeh at runtime via debugfs as this was calling that same callback again, but this is both unused and broken in several ways. If we want to revive it, we need to create a dedicated enable/disable callback on the backend that does the right thing. Let the bulk of eeh initialize normally at core_initcall() like it does on pseries by removing the hack in eeh_init() that delays it. Instead we make sure our eeh->probe cleanly bails out of the PEs haven't been created yet and we force a re-probe where we used to call eeh_init() again. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 23 8月, 2017 1 次提交
-
-
由 Rob Herring 提交于
Now that we have a custom printf format specifier, convert users of full_name to use %pOF instead. This is preparation to remove storing of the full path string for each node. Signed-off-by: NRob Herring <robh@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Anatolij Gustschin <agust@denx.de> Cc: Scott Wood <oss@buserror.net> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linuxppc-dev@lists.ozlabs.org Reviewed-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 08 8月, 2017 1 次提交
-
-
由 Frederic Barrat 提交于
P9 has support for PCI peer-to-peer, enabling a device to write in the MMIO space of another device directly, without interrupting the CPU. This patch adds support for it on powernv, by adding a new API to be called by drivers. The pnv_pci_set_p2p(...) call configures an 'initiator', i.e the device which will issue the MMIO operation, and a 'target', i.e. the device on the receiving side. P9 really only supports MMIO stores for the time being but that's expected to change in the future, so the API allows to define both load and store operations. /* PCI p2p descriptor */ #define OPAL_PCI_P2P_ENABLE 0x1 #define OPAL_PCI_P2P_LOAD 0x2 #define OPAL_PCI_P2P_STORE 0x4 int pnv_pci_set_p2p(struct pci_dev *initiator, struct pci_dev *target, u64 desc) It uses a new OPAL call, as the configuration magic is done on the PHBs by skiboot. Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: NRussell Currey <ruscur@russell.cc> [mpe: Drop unrelated OPAL calls, s/uint64_t/u64/, minor formatting] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 28 7月, 2017 1 次提交
-
-
由 Alistair Popple 提交于
Commit 8e3f1b1d ("powerpc/powernv/pci: Enable 64-bit devices to access >4GB DMA space") introduced the ability for PCI device drivers to request a DMA mask between 64 and 32 bits and actually get a mask greater than 32-bits. However currently if certain machine configuration dependent conditions are not meet the code silently falls back to a 32-bit mask. This makes it hard for device drivers to detect which mask they actually got. Instead we should return an error when the request could not be fulfilled which allows drivers to either fallback or implement other workarounds as documented in DMA-API-HOWTO.txt. Signed-off-by: NAlistair Popple <alistair@popple.id.au> Acked-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 27 6月, 2017 3 次提交
-
-
由 Russell Currey 提交于
On PHB3/POWER8 systems, devices can select between two different sections of address space, TVE#0 and TVE#1. TVE#0 is intended for 32bit devices that aren't capable of addressing more than 4GB. Selecting TVE#1 instead, with the capability of addressing over 4GB, is performed by setting bit 59 of a PCI address. However, some devices aren't capable of addressing at least 59 bits, but still want more than 4GB of DMA space. In order to enable this, reconfigure TVE#0 to be suitable for 64-bit devices by allocating memory past the initial 4GB that is inaccessible by 64-bit DMAs. This bypass mode is only enabled if a device requests 4GB or more of DMA address space, if the system has PHB3 (POWER8 systems), and if the device does not share a PE with any devices from different vendors. Signed-off-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Russell Currey 提交于
Add a helper that determines if all the devices contained in a given PE are all from the same vendor or not. This can be useful in determining if it's okay to make PE-wide changes that may be suitable for some devices but not for others. This is used later in the series. Signed-off-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Russell Currey 提交于
Diagnostic data for PHBs currently works by allocated a fixed-sized buffer. This is simple, but either wastes memory (though only a few kilobytes) or in the case of PHB4 isn't enough to fit the whole data blob. For machines that don't describe the diagnostic data size in the device tree, use the hardcoded buffer size as before. For those that do, only allocate exactly what's needed. In the special case of P7IOC (which has two types of diag data), the larger should be specified in the device tree. Signed-off-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 03 5月, 2017 1 次提交
-
-
由 Alistair Popple 提交于
Commit 616badd2 ("powerpc/powernv: Use OPAL call for TCE kill on NVLink2") forced all TCE kills to go via the OPAL call for NVLink2. However the PHB3 implementation of TCE kill was still being called directly from some functions which in some circumstances caused a machine check. This patch adds an equivalent IODA2 version of the function which uses the correct invalidation method depending on PHB model and changes all external callers to use it instead. Fixes: 616badd2 ("powerpc/powernv: Use OPAL call for TCE kill on NVLink2") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: NAlistair Popple <alistair@popple.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 28 4月, 2017 2 次提交
-
-
由 Alexey Kardashevskiy 提交于
When the userspace requests a small TCE table (which takes less than the system page size) and more than 1 TCE level, the existing code returns a single page size which is a bug as each additional TCE level requires at least one page and this is what pnv_pci_ioda2_table_alloc_pages() does. And we end up seeing WARN_ON(!ret && ((*ptbl)->it_allocated_size != table_size)) in drivers/vfio/vfio_iommu_spapr_tce.c. This replaces incorrect _ALIGN_UP() (which aligns zero up to zero) with max_t() to fix the bug. Besides removing WARN_ON(), there should be no other changes in behaviour. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
pnv_pci_table_alloc() ignores possible failure from kzalloc_node(), this adds a check. There are 2 callers of pnv_pci_table_alloc(), one already checks for tbl!=NULL, this adds WARN_ON() to the other path which only happens during boot time in IODA1 and not expected to fail. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 20 4月, 2017 1 次提交
-
-
由 Yongji Xie 提交于
Override pcibios_default_alignment() to set default alignment to PAGE_SIZE for all PCI devices on PowerNV platform. Thus sub-page BARs would not share a page and could be mapped into guest when VFIO passthrough them. Signed-off-by: NYongji Xie <elohimes@gmail.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 11 4月, 2017 1 次提交
-
-
由 Michael Ellerman 提交于
powerpc_debugfs_root is the dentry representing the root of the "powerpc" directory tree in debugfs. Currently it sits in asm/debug.h, a long with some other things that have "debug" in the name, but are otherwise unrelated. Pull it out into a separate header, which also includes linux/debugfs.h, and convert all the users to include debugfs.h instead of debug.h. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 04 4月, 2017 1 次提交
-
-
由 Alistair Popple 提交于
Nvlink2 supports address translation services (ATS) allowing devices to request address translations from an mmu known as the nest MMU which is setup to walk the CPU page tables. To access this functionality certain firmware calls are required to setup and manage hardware context tables in the nvlink processing unit (NPU). The NPU also manages forwarding of TLB invalidates (known as address translation shootdowns/ATSDs) to attached devices. This patch exports several methods to allow device drivers to register a process id (PASID/PID) in the hardware tables and to receive notification of when a device should stop issuing address translation requests (ATRs). It also adds a fault handler to allow device drivers to demand fault pages in. Signed-off-by: NAlistair Popple <alistair@popple.id.au> [mpe: Fix up comment formatting, use flush_tlb_mm()] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 30 3月, 2017 3 次提交
-
-
由 Alexey Kardashevskiy 提交于
So far iommu_table obejcts were only used in virtual mode and had a single owner. We are going to change this by implementing in-kernel acceleration of DMA mapping requests. The proposed acceleration will handle requests in real mode and KVM will keep references to tables. This adds a kref to iommu_table and defines new helpers to update it. This replaces iommu_free_table() with iommu_tce_table_put() and makes iommu_free_table() static. iommu_tce_table_get() is not used in this patch but it will be in the following patch. Since this touches prototypes, this also removes @node_name parameter as it has never been really useful on powernv and carrying it for the pseries platform code to iommu_free_table() seems to be quite useless as well. This should cause no behavioral change. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Acked-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
At the moment iommu_table can be disposed by either calling iommu_table_free() directly or it_ops::free(); the only implementation of free() is in IODA2 - pnv_ioda2_table_free() - and it calls iommu_table_free() anyway. As we are going to have reference counting on tables, we need an unified way of disposing tables. This moves it_ops::free() call into iommu_free_table() and makes use of the latter. The free() callback now handles only platform-specific data. As from now on the iommu_free_table() calls it_ops->free(), we need to have it_ops initialized before calling iommu_free_table() so this moves this initialization in pnv_pci_ioda2_create_table(). This should cause no behavioral change. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Acked-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
In real mode, TCE tables are invalidated using special cache-inhibited store instructions which are not available in virtual mode This defines and implements exchange_rm() callback. This does not define set_rm/clear_rm/flush_rm callbacks as there is no user for those - exchange/exchange_rm are only to be used by KVM for VFIO. The exchange_rm callback is defined for IODA1/IODA2 powernv platforms. This replaces list_for_each_entry_rcu with its lockless version as from now on pnv_pci_ioda2_tce_invalidate() can be called in the real mode too. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 20 3月, 2017 1 次提交
-
-
由 Alexey Kardashevskiy 提交于
PNV_IODA_PE_DEV is only used for NPU devices (emulated PCI bridges representing NVLink). These are added to IOMMU groups with corresponding NVIDIA devices after all non-NPU PEs are setup; a special helper - pnv_pci_ioda_setup_iommu_api() - handles this in pnv_pci_ioda_fixup(). The pnv_pci_ioda2_setup_dma_pe() helper sets up DMA for a PE. It is called for VFs (so it does not handle NPU case) and PCI bridges but only IODA1 and IODA2 types. An NPU bridge has its own type id (PNV_PHB_NPU) so pnv_pci_ioda2_setup_dma_pe() cannot be called on NPU and therefore (pe->flags & PNV_IODA_PE_DEV) is always "false". This removes not used iommu_add_device(). This should not cause any behavioral change. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 09 3月, 2017 2 次提交
-
-
由 Alexey Kardashevskiy 提交于
On POWERNV platform, in order to do DMA via IOMMU (i.e. 32bit DMA in our case), a device needs an iommu_table pointer set via set_iommu_table_base(). The codeflow is: - pnv_pci_ioda2_setup_dma_pe() - pnv_pci_ioda2_setup_default_config() - pnv_ioda_setup_bus_dma() [1] pnv_pci_ioda2_setup_dma_pe() creates IOMMU groups, pnv_pci_ioda2_setup_default_config() does default DMA setup, pnv_ioda_setup_bus_dma() takes a bus PE (on IODA2, all physical function PEs as bus PEs except NPU), walks through all underlying buses and devices, adds all devices to an IOMMU group and sets iommu_table. On IODA2, when VFIO is used, it takes ownership over a PE which means it removes all tables and creates new ones (with a possibility of sharing them among PEs). So when the ownership is returned from VFIO to the kernel, the iommu_table pointer written to a device at [1] is stale and needs an update. This adds an "add_to_group" parameter to pnv_ioda_setup_bus_dma() (in fact re-adds as it used to be there a while ago for different reasons) to tell the helper if a device needs to be added to an IOMMU group with an iommu_table update or just the latter. This calls pnv_ioda_setup_bus_dma(..., false) from pnv_ioda2_release_ownership() so when the ownership is restored, 32bit DMA can work again for a device. This does the same thing on obtaining ownership as the iommu_table point is stale at this point anyway and it is safer to have NULL there. We did not hit this earlier as all tested devices in recent years were only using 64bit DMA; the rare exception for this is MPT3 SAS adapter which uses both 32bit and 64bit DMA access and it has not been tested with VFIO much. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
The IODA2 specification says that a 64 DMA address cannot use top 4 bits (3 are reserved and one is a "TVE select"); bottom page_shift bits cannot be used for multilevel table addressing either. The existing IODA2 table allocation code aligns the minimum TCE table size to PAGE_SIZE so in the case of 64K system pages and 4K IOMMU pages, we have 64-4-12=48 bits. Since 64K page stores 8192 TCEs, i.e. needs 13 bits, the maximum number of levels is 48/13 = 3 so we physically cannot address more and EEH happens on DMA accesses. This adds a check that too many levels were requested. It is still possible to have 5 levels in the case of 4K system page size. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 28 2月, 2017 1 次提交
-
-
由 Masahiro Yamada 提交于
Fix typos and add the following to the scripts/spelling.txt: overrided||overridden Link: http://lkml.kernel.org/r/1481573103-11329-22-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 2月, 2017 1 次提交
-
-
由 Gavin Shan 提交于
The local variable @iov isn't used, to remove it. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-