- 30 3月, 2017 1 次提交
-
-
由 Alexey Kardashevskiy 提交于
So far iommu_table obejcts were only used in virtual mode and had a single owner. We are going to change this by implementing in-kernel acceleration of DMA mapping requests. The proposed acceleration will handle requests in real mode and KVM will keep references to tables. This adds a kref to iommu_table and defines new helpers to update it. This replaces iommu_free_table() with iommu_tce_table_put() and makes iommu_free_table() static. iommu_tce_table_get() is not used in this patch but it will be in the following patch. Since this touches prototypes, this also removes @node_name parameter as it has never been really useful on powernv and carrying it for the pseries platform code to iommu_free_table() seems to be quite useless as well. This should cause no behavioral change. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Acked-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 30 1月, 2017 1 次提交
-
-
由 Alistair Popple 提交于
Add detection of NPU2 PHBs. NPU2/NVLink2 has a different register layout for the TCE kill register therefore TCE invalidation should be done via the OPAL call rather than using the register directly as it is for PHB3 and NVLink1. This changes TCE invalidation to use the OPAL call in the case of a NPU2 PHB model. Signed-off-by: NAlistair Popple <alistair@popple.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 22 11月, 2016 1 次提交
-
-
由 Russell Currey 提交于
PHB, PE (and by association MVE) numbers are printed as a mix of decimal and hexadecimal throughout the kernel. This can be misleading, so make them all hexadecimal. Standardising on hex instead of dec because: - PHB numbers are presented in hex in sysfs/debugfs (and lspci, etc) - PE numbers are presented as hex in sysfs and parsed in hex in debugfs The only place I think this could cause confusing are the messages during boot, i.e. pci 000a:01 : [PE# 000] Secondary bus 1 associated with PE#0 which can be a quick way to check PE numbers. pe_level_printk() will only print two characters instead of three, so the above would be pci 000a:01 : [PE# 00] Secondary bus 1 associated with PE#0 which gives a hint it's in hex. Signed-off-by: NRussell Currey <ruscur@russell.cc> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 04 10月, 2016 1 次提交
-
-
由 Gavin Shan 提交于
This fixes the warnings reported from sparse: pci.c:312:33: warning: restricted __be64 degrades to integer pci.c:313:33: warning: restricted __be64 degrades to integer Fixes: cee72d5b ("powerpc/powernv: Display diag data on p7ioc EEH errors") Cc: stable@vger.kernel.org # v3.3+ Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 20 9月, 2016 1 次提交
-
-
由 Michael Ellerman 提交于
NO_IRQ has been == 0 on powerpc for just over ten years (since commit 0ebfff14 ("[POWERPC] Add new interrupt mapping core and change platforms to use it")). It's also 0 on most other arches. Although it's fairly harmless, every now and then it causes confusion when a driver is built on powerpc and another arch which doesn't define NO_IRQ. There's at least 6 definitions of NO_IRQ in drivers/, at least some of which are to work around that problem. So we'd like to remove it. This is fairly trivial in the arch code, we just convert: if (irq == NO_IRQ) to if (!irq) if (irq != NO_IRQ) to if (irq) irq = NO_IRQ; to irq = 0; return NO_IRQ; to return 0; And a few other odd cases as well. At least for now we keep the #define NO_IRQ, because there is driver code that uses NO_IRQ and the fixes to remove those will go via other trees. Note we also change some occurrences in PPC sound drivers, drivers/ps3, and drivers/macintosh. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 04 8月, 2016 1 次提交
-
-
由 Krzysztof Kozlowski 提交于
The dma-mapping core and the implementations do not change the DMA attributes passed by pointer. Thus the pointer can point to const data. However the attributes do not have to be a bitfield. Instead unsigned long will do fine: 1. This is just simpler. Both in terms of reading the code and setting attributes. Instead of initializing local attributes on the stack and passing pointer to it to dma_set_attr(), just set the bits. 2. It brings safeness and checking for const correctness because the attributes are passed by value. Semantic patches for this change (at least most of them): virtual patch virtual context @r@ identifier f, attrs; @@ f(..., - struct dma_attrs *attrs + unsigned long attrs , ...) { ... } @@ identifier r.f; @@ f(..., - NULL + 0 ) and // Options: --all-includes virtual patch virtual context @r@ identifier f, attrs; type t; @@ t f(..., struct dma_attrs *attrs); @@ identifier r.f; @@ f(..., - NULL + 0 ) Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.comSigned-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: NVineet Gupta <vgupta@synopsys.com> Acked-by: NRobin Murphy <robin.murphy@arm.com> Acked-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no> Acked-by: Mark Salter <msalter@redhat.com> [c6x] Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris] Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm] Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp] Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core] Acked-by: David Vrabel <david.vrabel@citrix.com> [xen] Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb] Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon] Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32] Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc] Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 7月, 2016 1 次提交
-
-
由 Alexey Kardashevskiy 提交于
The iommu_table_ops::exchange() callback writes new TCE to the table and returns old value and permission mask. The old TCE value is correctly converted from BE to CPU endian; however permission mask was calculated from BE value and therefore always returned DMA_NONE which could cause memory leak on LE systems using VFIO SPAPR TCE IOMMU v1 driver. This fixes pnv_tce_xchg() to have @oldtce a CPU endian. Fixes: 05c6cfb9 ("powerpc/iommu/powernv: Release replaced TCE") Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 17 7月, 2016 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
We instanciate them as IODA2. We also change the MSI EOI hack to only kick on PHB3 since it will not be needed on any new implementation. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 21 6月, 2016 4 次提交
-
-
由 Gavin Shan 提交于
This exports 4 functions, which base on the corresponding OPAL APIs to get/set PCI slot status. Those functions are going to be used by PowerNV PCI hotplug driver: pnv_pci_get_device_tree() opal_get_device_tree() pnv_pci_get_presence_state() opal_pci_get_presence_state() pnv_pci_get_power_state() opal_pci_get_power_state() pnv_pci_set_power_state() opal_pci_set_power_state() Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Gavin Shan 提交于
This introduces pnv_pci_get_slot_id() to get the hotpluggable PCI slot ID from the corresponding device node. It will be used by hotplug driver. Requested-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Gavin Shan 提交于
The pdn (struct pci_dn) instances are allocated from memblock or bootmem when creating PCI controller (hoses) in setup_arch(). PCI hotplug, which will be supported by proceeding patches, releases PCI device nodes and their corresponding pdn on unplugging event. The memory chunks for pdn instances allocated from memblock or bootmem are hard to reused after being released. This delays creating pdn by pci_devs_phb_init() from setup_arch() to core_initcall() so that they are allocated from slab. The memory consumed by pdn can be released to system without problem during PCI unplugging time. It indicates that pci_dn is unavailable in setup_arch() and the the fixup on pdn (like AGP's) can't be carried out that time. We have to do that in pcibios_root_bridge_prepare() on maple/pasemi/powermac platforms where/when the pdn is available. pcibios_root_bridge_prepare is called from subsys_initcall() which is executed after core_initcall() so the code flow does not change. At the mean while, the EEH device is created when pdn is populated, meaning pdn and EEH device have same life cycle. In turn, we needn't call eeh_dev_init() to create EEH device explicitly. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Gavin Shan 提交于
The macro defined in arch/powerpc/platforms/powernv/pci.c isn't used by anyone. Just remove it. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 11 5月, 2016 3 次提交
-
-
由 Gavin Shan 提交于
This changes the data type of PE number from "int" to "unsigned int" in order to match the fact PE number is never negative: * The number of PE to which the specified PCI device is attached. * The PE number map for SRIOV VFs. * The returned PE number from pnv_ioda_alloc_pe(). * The returned PE number from pnv_ioda2_pick_m64_pe(). Suggested-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-By: NAlistair Popple <alistair@popple.id.au> Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Gavin Shan 提交于
This renames the fields related to PE number in "struct pnv_phb" for better reflecting of their usages as Alexey suggested. No logical changes introduced. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
When cfg_dbg() is enabled (i.e. mapped to printk()), gcc produces errors as the __func__ parameter is missing (pnv_pci_cfg_read() has one); this adds the missing parameter. cfg_dbg() is just an inferior version of pr_devel() so use the latter instead. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 17 2月, 2016 1 次提交
-
-
由 Alexey Kardashevskiy 提交于
Quite often drivers set only "write" permission assuming that this includes "read" permission as well and this works on plenty of platforms. However IODA2 is strict about this and produces an EEH when "read" permission is not set and reading happens. This adds a workaround in the IODA code to always add the "read" bit when the "write" bit is set. Fixes: 10b35b2b ("powerpc/powernv: Do not set "read" flag if direction==DMA_NONE") Cc: stable@vger.kernel.org # 4.2+ Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Tested-by: NDouglas Miller <dougmill@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 15 2月, 2016 1 次提交
-
-
由 Gavin Shan 提交于
When PCI bus is unplugged during full hotplug for EEH recovery, the platform PE instance (struct pnv_ioda_pe) isn't released and it dereferences the stale PCI bus that has been released. It leads to kernel crash when referring to the stale PCI bus. This fixes the issue by correcting the PE's primary bus when it's oneline at plugging time, in pnv_pci_dma_bus_setup() which is to be called by pcibios_fixup_bus(). Cc: stable@vger.kernel.org # v4.1+ Reported-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Reported-by: NPradipta Ghosh <pradghos@in.ibm.com> Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 08 2月, 2016 1 次提交
-
-
由 Russell Currey 提交于
"p5ioc2 is used by approximately 2 machines in the world, and has never ever been a supported configuration." The code for p5ioc2 is essentially unused and complicates what is already a very complicated codebase. Its removal is essentially a "free win" in the effort to simplify the powernv PCI code. In addition, support for p5ioc2 has been dropped from skiboot. There's no reason to keep it around in the kernel. Signed-off-by: NRussell Currey <ruscur@russell.cc> Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: NStewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 11 1月, 2016 1 次提交
-
-
由 Russell Currey 提交于
PCI in powernv now supports quite a bit more than p5ioc2, so remove the outdated comment. Signed-off-by: NRussell Currey <ruscur@russell.cc> Acked-by: NStewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 17 12月, 2015 1 次提交
-
-
由 Alistair Popple 提交于
NVLink is a high speed interconnect that is used in conjunction with a PCI-E connection to create an interface between CPU and GPU that provides very high data bandwidth. A PCI-E connection to a GPU is used as the control path to initiate and report status of large data transfers sent via the NVLink. On IBM Power systems the NVLink processing unit (NPU) is similar to the existing PHB3. This patch adds support for a new NPU PHB type. DMA operations on the NPU are not supported as this patch sets the TCE translation tables to be the same as the related GPU PCIe device for each NVLink. Therefore all DMA operations are setup and controlled via the PCIe device. EEH is not presently supported for the NPU devices, although it may be added in future. Signed-off-by: NAlistair Popple <alistair@popple.id.au> Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 10 9月, 2015 1 次提交
-
-
由 Paul Mackerras 提交于
This fixes a race which can result in the same virtual IRQ number being assigned to two different MSI interrupts. The most visible consequence of that is usually a warning and stack trace from the sysfs code about an attempt to create a duplicate entry in sysfs. The race happens when one CPU (say CPU 0) is disposing of an MSI while another CPU (say CPU 1) is setting up an MSI. CPU 0 calls (for example) pnv_teardown_msi_irqs(), which calls msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its hardware IRQ number) is no longer in use. Then, before CPU 0 gets to calling irq_dispose_mapping() to free up the virtal IRQ number, CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an MSI, and gets the same hardware IRQ number that CPU 0 just freed. CPU 1 then calls irq_create_mapping() to get a virtual IRQ number, which sees that there is currently a mapping for that hardware IRQ number and returns the corresponding virtual IRQ number (which is the same virtual IRQ number that CPU 0 was using). CPU 0 then calls irq_dispose_mapping() and frees that virtual IRQ number. Now, if another CPU comes along and calls irq_create_mapping(), it is likely to get the virtual IRQ number that was just freed, resulting in the same virtual IRQ number apparently being used for two different hardware interrupts. To fix this race, we just move the call to msi_bitmap_free_hwirqs() to after the call to irq_dispose_mapping(). Since virq_to_hw() doesn't work for the virtual IRQ number after irq_dispose_mapping() has been called, we need to call it before irq_dispose_mapping() and remember the result for the msi_bitmap_free_hwirqs() call. The pattern of calling msi_bitmap_free_hwirqs() before irq_dispose_mapping() appears in 5 places under arch/powerpc, and appears to have originated in commit 05af7bd2 ("[POWERPC] MPIC U3/U4 MSI backend") from 2007. Fixes: 05af7bd2 ("[POWERPC] MPIC U3/U4 MSI backend") Cc: stable@vger.kernel.org # v2.6.22+ Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 18 8月, 2015 1 次提交
-
-
由 Andrew Donnellan 提交于
Simplify the dma_get_required_mask call chain by moving it from pnv_phb to pci_controller_ops, similar to commit 763d2d8d ("powerpc/powernv: Move dma_set_mask from pnv_phb to pci_controller_ops"). Previous call chain: 0) call dma_get_required_mask() (kernel/dma.c) 1) call ppc_md.dma_get_required_mask, if it exists. On powernv, that points to pnv_dma_get_required_mask() (platforms/powernv/setup.c) 2) device is PCI, therefore call pnv_pci_dma_get_required_mask() (platforms/powernv/pci.c) 3) call phb->dma_get_required_mask if it exists 4) it only exists in the ioda case, where it points to pnv_pci_ioda_dma_get_required_mask() (platforms/powernv/pci-ioda.c) New call chain: 0) call dma_get_required_mask() (kernel/dma.c) 1) device is PCI, therefore call pci_controller_ops.dma_get_required_mask if it exists 2) in the ioda case, that points to pnv_pci_ioda_dma_get_required_mask() (platforms/powernv/pci-ioda.c) In the p5ioc2 case, the call chain remains the same - dma_get_required_mask() does not find either a ppc_md call or pci_controller_ops call, so it calls __dma_get_required_mask(). Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: NDaniel Axtens <dja@axtens.net> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 23 7月, 2015 1 次提交
-
-
由 Jiang Liu 提交于
Use accessor for_each_pci_msi_entry() to access MSI device list, so we could easily move msi_list from struct pci_dev into struct device later. Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Stuart Yoder <stuart.yoder@freescale.com> Cc: Yijing Wang <wangyijing@huawei.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Olof Johansson <olof@lixom.net> Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Daniel Axtens <dja@axtens.net> Cc: Wei Yang <weiyang@linux.vnet.ibm.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Alexander Gordeev <agordeev@redhat.com> Cc: Scott Wood <scottwood@freescale.com> Cc: Laurentiu Tudor <Laurentiu.Tudor@freescale.com> Cc: Tudor Laurentiu <b10716@freescale.com> Cc: Hongtao Jia <hongtao.jia@freescale.com> Link: http://lkml.kernel.org/r/1436428847-8886-4-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 11 6月, 2015 7 次提交
-
-
由 Alexey Kardashevskiy 提交于
TCE tables might get too big in case of 4K IOMMU pages and DDW enabled on huge guests (hundreds of GB of RAM) so the kernel might be unable to allocate contiguous chunk of physical memory to store the TCE table. To address this, POWER8 CPU (actually, IODA2) supports multi-level TCE tables, up to 5 levels which splits the table into a tree of smaller subtables. This adds multi-level TCE tables support to pnv_pci_ioda2_table_alloc_pages() and pnv_pci_ioda2_table_free_pages() helpers. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
At the moment writing new TCE value to the IOMMU table fails with EBUSY if there is a valid entry already. However PAPR specification allows the guest to write new TCE value without clearing it first. Another problem this patch is addressing is the use of pool locks for external IOMMU users such as VFIO. The pool locks are to protect DMA page allocator rather than entries and since the host kernel does not control what pages are in use, there is no point in pool locks and exchange()+put_page(oldtce) is sufficient to avoid possible races. This adds an exchange() callback to iommu_table_ops which does the same thing as set() plus it returns replaced TCE and DMA direction so the caller can release the pages afterwards. The exchange() receives a physical address unlike set() which receives linear mapping address; and returns a physical address as the clear() does. This implements exchange() for P5IOC2/IODA/IODA2. This adds a requirement for a platform to have exchange() implemented in order to support VFIO. This replaces iommu_tce_build() and iommu_clear_tce() with a single iommu_tce_xchg(). This makes sure that TCE permission bits are not set in TCE passed to IOMMU API as those are to be calculated by platform code from DMA direction. This moves SetPageDirty() to the IOMMU code to make it work for both VFIO ioctl interface in in-kernel TCE acceleration (when it becomes available later). Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: NAlex Williamson <alex.williamson@redhat.com> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
This replaces direct accesses to TCE table with a helper which returns an TCE entry address. This does not make difference now but will when multi-level TCE tables get introduces. No change in behavior is expected. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
So far one TCE table could only be used by one IOMMU group. However IODA2 hardware allows programming the same TCE table address to multiple PE allowing sharing tables. This replaces a single pointer to a group in a iommu_table struct with a linked list of groups which provides the way of invalidating TCE cache for every PE when an actual TCE table is updated. This adds pnv_pci_link_table_and_group() and pnv_pci_unlink_table_and_group() helpers to manage the list. However without VFIO, it is still going to be a single IOMMU group per iommu_table. This changes iommu_add_device() to add a device to a first group from the group list of a table as it is only called from the platform init code or PCI bus notifier and at these moments there is only one group per table. This does not change TCE invalidation code to loop through all attached groups in order to simplify this patch and because it is not really needed in most cases. IODA2 is fixed in a later patch. This should cause no behavioural change. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: NAlex Williamson <alex.williamson@redhat.com> Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
The pnv_pci_ioda_tce_invalidate() helper invalidates TCE cache. It is supposed to be called on IODA1/2 and not called on p5ioc2. It receives start and end host addresses of TCE table. IODA2 actually needs PCI addresses to invalidate the cache. Those can be calculated from host addresses but since we are going to implement multi-level TCE tables, calculating PCI address from a host address might get either tricky or ugly as TCE table remains flat on PCI bus but not in RAM. This moves pnv_pci_ioda_tce_invalidate() from generic pnv_tce_build/ pnt_tce_free and defines IODA1/2-specific callbacks which call generic ones and do PHB-model-specific TCE cache invalidation. P5IOC2 keeps using generic callbacks as before. This changes pnv_pci_ioda2_tce_invalidate() to receives TCE index and number of pages which are PCI addresses shifted by IOMMU page shift. No change in behaviour is expected. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
This adds a iommu_table_ops struct and puts pointer to it into the iommu_table struct. This moves tce_build/tce_free/tce_get/tce_flush callbacks from ppc_md to the new struct where they really belong to. This adds the requirement for @it_ops to be initialized before calling iommu_init_table() to make sure that we do not leave any IOMMU table with iommu_table_ops uninitialized. This is not a parameter of iommu_init_table() though as there will be cases when iommu_init_table() will not be called on TCE tables, for example - VFIO. This does s/tce_build/set/, s/tce_free/clear/ and removes "tce_" redundant prefixes. This removes tce_xxx_rm handlers from ppc_md but does not add them to iommu_table_ops as this will be done later if we decide to support TCE hypercalls in real mode. This removes _vm callbacks as only virtual mode is supported by now so this also removes @rm parameter. For pSeries, this always uses tce_buildmulti_pSeriesLP/ tce_buildmulti_pSeriesLP. This changes multi callback to fall back to tce_build_pSeriesLP/tce_free_pSeriesLP if FW_FEATURE_MULTITCE is not present. The reason for this is we still have to support "multitce=off" boot parameter in disable_multitce() and we do not want to walk through all IOMMU tables in the system and replace "multi" callbacks with single ones. For powernv, this defines _ops per PHB type which are P5IOC2/IODA1/IODA2. This makes the callbacks for them public. Later patches will extend callbacks for IODA1/2. No change in behaviour is expected. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
Normally a bitmap from the iommu_table is used to track what TCE entry is in use. Since we are going to use iommu_table without its locks and do xchg() instead, it becomes essential not to put bits which are not implied in the direction flag as the old TCE value (more precisely - the permission bits) will be used to decide whether to put the page or not. This adds iommu_direction_to_tce_perm() (its counterpart is there already) and uses it for powernv's pnv_tce_build(). Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 03 6月, 2015 1 次提交
-
-
由 Michael Neuling 提交于
Currently pnv_pci_shutdown() calls the PHB shutdown code for all PHBs in the system. It dereferences the private_data assuming it's a powernv PHB, which won't be the case when we have different PHB in the systems (like when we add vPHBs for CXL). This moves the shutdown hook to the pci_controller_ops and fixes the call site to use that instead. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 02 6月, 2015 2 次提交
-
-
由 Daniel Axtens 提交于
Previously, dma_set_mask() on powernv was convoluted: 0) Call dma_set_mask() (a/p/kernel/dma.c) 1) In dma_set_mask(), ppc_md.dma_set_mask() exists, so call it. 2) On powernv, that function pointer is pnv_dma_set_mask(). In pnv_dma_set_mask(), the device is pci, so call pnv_pci_dma_set_mask(). 3) In pnv_pci_dma_set_mask(), call pnv_phb->set_dma_mask() if it exists. 4) It only exists in the ioda case, where it points to pnv_pci_ioda_dma_set_mask(), which is the final function. So the call chain is: dma_set_mask() -> pnv_dma_set_mask() -> pnv_pci_dma_set_mask() -> pnv_pci_ioda_dma_set_mask() Both ppc_md and pnv_phb function pointers are used. Rip out the ppc_md call, pnv_dma_set_mask() and pnv_pci_dma_set_mask(). Instead: 0) Call dma_set_mask() (a/p/kernel/dma.c) 1) In dma_set_mask(), the device is pci, and pci_controller_ops.dma_set_mask() exists, so call pci_controller_ops.dma_set_mask() 2) In the ioda case, that points to pnv_pci_ioda_dma_set_mask(). The new call chain is dma_set_mask() -> pnv_pci_ioda_dma_set_mask() Now only the pci_controller_ops function pointer is used. The fallback paths for p5ioc2 are the same. Previously, pnv_pci_dma_set_mask() would find no pnv_phb->set_dma_mask() function, to it would call __set_dma_mask(). Now, dma_set_mask() finds no ppc_md call or pci_controller_ops call, so it calls __set_dma_mask(). Signed-off-by: NDaniel Axtens <dja@axtens.net> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Daniel Axtens 提交于
Remove powernv generic PCI controller operations. Replace it with controller ops for each of the two supported PHBs. As an added bonus, make the two new structs const, which will help guard against bugs such as the one introduced in 65ebf4b6 ("powerpc/powernv: Move controller ops from ppc_md to controller_ops") Signed-off-by: NDaniel Axtens <dja@axtens.net> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 22 5月, 2015 1 次提交
-
-
由 Daniel Axtens 提交于
Move the PowerNV/BML platform to use the pci_controller_ops structure rather than ppc_md for MSI related PCI controller operations. Signed-off-by: NDaniel Axtens <dja@axtens.net> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 11 4月, 2015 1 次提交
-
-
由 Daniel Axtens 提交于
This moves the PowerNV platform to use the pci_controller_ops structure rather than ppc_md for PCI controller operations. Signed-off-by: NDaniel Axtens <dja@axtens.net> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 07 4月, 2015 1 次提交
-
-
由 Michael Ellerman 提交于
The powernv code has some conditional support for running on bare metal machines that have no OPAL firmware, but provide RTAS. No released machines ever supported that, and even in the lab it was just a transitional hack in the days when OPAL was still being developed. So remove the code. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: NStewart Smith <stewart@linux.vnet.ibm.com>
-
- 31 3月, 2015 1 次提交
-
-
由 Wei Yang 提交于
On PowerNV platform, resource position in M64 BAR implies the PE# the resource belongs to. In some cases, adjustment of a resource is necessary to locate it to a correct position in M64 BAR . This patch adds pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address according to an offset. Note: After doing so, there would be a "hole" in the /proc/iomem when offset is a positive value. It looks like the device return some mmio back to the system, which actually no one could use it. [bhelgaas: rework loops, rework overlap check, index resource[] conventionally, remove pci_regs.h include, squashed with next patch] Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 24 3月, 2015 1 次提交
-
-
由 Gavin Shan 提交于
The PCI config accessors previously relied on device_node. Unfortunately, VFs don't have a corresponding device_node, so change the accessors to use pci_dn instead. [bhelgaas: changelog] Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 04 3月, 2015 1 次提交
-
-
由 Nishanth Aravamudan 提交于
After d905c5df ("PPC: POWERNV: move iommu_add_device earlier"), the refcnt on the kobject backing the IOMMU group for a PCI device is elevated by each call to pci_dma_dev_setup_pSeriesLP() (via set_iommu_table_base_and_group). When we go to dlpar a multi-function PCI device out: iommu_reconfig_notifier -> iommu_free_table -> iommu_group_put BUG_ON(tbl->it_group) We trip this BUG_ON, because there are still references on the table, so it is not freed. Fix this by moving the powernv bus notifier to common code and calling it for both powernv and pseries. Fixes: d905c5df ("PPC: POWERNV: move iommu_add_device earlier") Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com> Tested-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 23 1月, 2015 1 次提交
-
-
由 Gavin Shan 提交于
The callback (ppc_md.pci_probe_mode()) is used to determine if the child PCI devices of the indicated PCI bus should be probed from device-tree or hardware. On PowerNV platform, we always expect probing PCI devices from hardware, which is PowerPC PCI core's default behaviour. Also, the callback had some delay implemented based on PHB's device node property "reset-clear-timestamp", which wasn't exported from skiboot. So we don't need this function and it's safe to remove it. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-