提交 · acf6cec8365bfe1e6435e503eae4eab48ee04fc8 · openeuler / Kernel

19 8月, 2015 1 次提交

powerpc/512x: silence a USB Kconfig dependency warning · acf6cec8

由 Gerhard Sittig 提交于 6月 03, 2013

the PPC_MPC512x config automatically selected USB_EHCI_BIG_ENDIAN_*
switches, which made Kconfig warn about "unmet direct dependencies":

scripts/kconfig/conf --silentoldconfig Kconfig
warning: (PPC_MPC512x && 440EPX) selects USB_EHCI_BIG_ENDIAN_DESC which has unmet direct dependencies (USB_SUPPORT && USB && USB_EHCI_HCD)
warning: (PPC_MPC512x && PPC_PS3 && PPC_CELLEB && 440EPX) selects USB_EHCI_BIG_ENDIAN_MMIO which has unmet direct dependencies (USB_SUPPORT && USB && USB_EHCI_HCD)
warning: (PPC_MPC512x && 440EPX) selects USB_EHCI_BIG_ENDIAN_DESC which has unmet direct dependencies (USB_SUPPORT && USB && USB_EHCI_HCD)
warning: (PPC_MPC512x && PPC_PS3 && PPC_CELLEB && 440EPX) selects USB_EHCI_BIG_ENDIAN_MMIO which has unmet direct dependencies (USB_SUPPORT && USB && USB_EHCI_HCD)

make the selected entries additionally depend on USB_EHCI_HCD which
silences the warning
Signed-off-by: NGerhard Sittig <gsi@denx.de>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

acf6cec8

18 8月, 2015 3 次提交

powerpc/pseries: use kmemdup rather than duplicating its implementation · 2e16acc5

由 Andrzej Hajda 提交于 8月 07, 2015

The patch was generated using fixed coccinelle semantic patch
scripts/coccinelle/api/memdup.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2014320Signed-off-by: NAndrzej Hajda <a.hajda@samsung.com>
Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2e16acc5

powerpc/powernv: move dma_get_required_mask from pnv_phb to pci_controller_ops · 53522982

由 Andrew Donnellan 提交于 8月 07, 2015

Simplify the dma_get_required_mask call chain by moving it from pnv_phb to
pci_controller_ops, similar to commit 763d2d8d ("powerpc/powernv:
Move dma_set_mask from pnv_phb to pci_controller_ops").

Previous call chain:

  0) call dma_get_required_mask() (kernel/dma.c)
  1) call ppc_md.dma_get_required_mask, if it exists. On powernv, that
     points to pnv_dma_get_required_mask() (platforms/powernv/setup.c)
  2) device is PCI, therefore call pnv_pci_dma_get_required_mask()
     (platforms/powernv/pci.c)
  3) call phb->dma_get_required_mask if it exists
  4) it only exists in the ioda case, where it points to
       pnv_pci_ioda_dma_get_required_mask() (platforms/powernv/pci-ioda.c)

New call chain:

  0) call dma_get_required_mask() (kernel/dma.c)
  1) device is PCI, therefore call pci_controller_ops.dma_get_required_mask
     if it exists
  2) in the ioda case, that points to pnv_pci_ioda_dma_get_required_mask()
     (platforms/powernv/pci-ioda.c)

In the p5ioc2 case, the call chain remains the same -
dma_get_required_mask() does not find either a ppc_md call or
pci_controller_ops call, so it calls __dma_get_required_mask().
Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

53522982

powerpc/cell: Drop support for 64K local store on 4K kernels · f444f1f8

由 Michael Ellerman 提交于 8月 07, 2015

Back in the olden days we added support for using 64K pages to map the
SPU (Synergistic Processing Unit) local store on Cell, when the main
kernel was using 4K pages.

This was useful at the time because distros were using 4K pages, but
using 64K pages on the SPUs could reduce TLB pressure there.

However these days the number of Cell users is approaching zero, and
supporting this option adds unpleasant complexity to the memory
management code.

So drop the option, CONFIG_SPU_FS_64K_LS, and all related code.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Acked-by: NJeremy Kerr <jk@ozlabs.org>

f444f1f8

14 8月, 2015 1 次提交

powerpc: Add an inline function to update POWER8 HID0 · e63dbd16

由 Gautham R. Shenoy 提交于 8月 05, 2015

Section 3.7 of Version 1.2 of the Power8 Processor User's Manual
prescribes that updates to HID0 be preceded by a SYNC instruction and
followed by an ISYNC instruction (Page 91).

Create an inline function name update_power8_hid0() which follows this
recipe and invoke it from the static split core path.
Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
Tested-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e63dbd16

06 8月, 2015 4 次提交

powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable HMI. · 62521ea6

由 Mahesh Salgaonkar 提交于 8月 04, 2015

Invoke new opal_cec_reboot2() call with reboot type
OPAL_REBOOT_PLATFORM_ERROR (for unrecoverable HMI interrupts) to inform
BMC/OCC about this error, so that BMC can collect relevant data for error
analysis and decide what component to de-configure before rebooting.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

62521ea6

powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable machine check errors. · e784b649

由 Mahesh Salgaonkar 提交于 7月 31, 2015

On non-recoverable MCE errors in kernel space, Linux kernel panics
and system reboots. On BMC based system opal-prd runs as a daemon
in the host. Hence, kernel crash may prevent opal-prd to detect and
analyze this MCE error. This may land us in a situation where the faulty
memory never gets de-configured and Linux would keep hitting same MCE error
again and again. If this happens in early stage of kernel initialization,
then Linux will keep crashing and rebooting in a loop.

This patch fixes this issue by invoking new opal_cec_reboot2() call with
reboot type OPAL_REBOOT_PLATFORM_ERROR to inform BMC/OCC about this
error, so that BMC can collect relevant data for error analysis and
decide what component to de-configure before rebooting.

This patch is dependent on OPAL patchset posted on skiboot mailing list
at https://lists.ozlabs.org/pipermail/skiboot/2015-July/001771.html that
introduces opal_cec_reboot2() opal call.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e784b649

powerpc/powernv: Pull all HMI events before panic. · 1852ae27

由 Mahesh Salgaonkar 提交于 5月 05, 2015

In the event of unrecovered HMI the existing code panics as soon as
it receives the first unrecovered HMI event. This makes host to report
partial information about HMIs before panic. There may be more errors
which would have caused the HMI and hence more HMI event would have been
generated waiting to be pulled by host. This patch implements a logic to
pull and display all the HMI event before going down panic path.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1852ae27

powerpc/powernv: display reason for Malfunction Alert HMI. · c33e11d0

由 Mahesh Salgaonkar 提交于 5月 05, 2015

The V2 version of HMI event now carries additional information for
Malfunction Alert. It now contains error information about CORE and NX
checkstop. This patch checks and displays the check stop reason before
panic.
Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: NStewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c33e11d0

23 7月, 2015 2 次提交

powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_* · 01c9348c

由 Paul Mackerras 提交于 7月 17, 2015

The hardware RNG on POWER8 and POWER7+ can be relatively slow, since
it can only supply one 64-bit value per microsecond.  Currently we
read it in arch_get_random_long(), but that slows down reading from
/dev/urandom since the code in random.c calls arch_get_random_long()
for every longword read from /dev/urandom.

Since the hardware RNG supplies high-quality entropy on every read, it
matches the semantics of arch_get_random_seed_long() better than those
of arch_get_random_long().  Therefore this commit makes the code use
the POWER8/7+ hardware RNG only for arch_get_random_seed_{long,int}
and not for arch_get_random_{long,int}.

This won't affect any other PowerPC-based platforms because none of
them currently support a hardware RNG.  To make it clear that the
ppc_md function pointer is used for arch_get_random_seed_*, we rename
it from get_random_long to get_random_seed.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

01c9348c

powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers · 1c2cb594

由 Thomas Huth 提交于 7月 17, 2015

The EPOW interrupt handler uses rtas_get_sensor(), which in turn
uses rtas_busy_delay() to wait for RTAS becoming ready in case it
is necessary. But rtas_busy_delay() is annotated with might_sleep()
and thus may not be used by interrupts handlers like the EPOW handler!
This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
enabled:

 BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496
 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
 Call Trace:
 [c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable)
 [c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180
 [c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0
 [c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0
 [c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450
 [c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300
 [c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0
 [c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260
 [c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80
 [c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200
 [c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24
 [c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110
 [c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180

Fix this issue by introducing a new rtas_get_sensor_fast() function
that does not use rtas_busy_delay() - and thus can only be used for
sensors that do not cause a BUSY condition - known as "fast" sensors.

The EPOW sensor is defined to be "fast" in sPAPR - mpe.

Fixes: 587f83e8 ("powerpc/pseries: Use rtas_get_sensor in RAS code")
Signed-off-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1c2cb594

21 7月, 2015 2 次提交

powerpc/eeh: Dump PHB diag-data for non-existing PE · 79cd9520

由 Gavin Shan 提交于 5月 12, 2015

When detecting EEH error on non-existing PE, including the reserved
one, the PE is simply unfrozen without dumping the PHB diag-data,
which is useful for locating the root cause of the EEH error. The
patch dumps the PHB diag-data when non-existing PE reports error.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

79cd9520

powerpc/eeh: Fix wrong printed PE number · 0f36db77

由 Gavin Shan 提交于 5月 12, 2015

On LE kernel, the non-existing PE number in BE format derived from
skiboot firmware isn't converted to LE format properly as following
kernel log indicates:

   EEH: Clear non-existing PHB#4-PE#200000000000000
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0f36db77

16 7月, 2015 1 次提交

powerpc/powernv: Add poweroff (EPOW, DPO) events support for PowerNV platform · 3b476aad

由 Vipin K Parashar 提交于 7月 08, 2015

This patch adds support for OPAL EPOW (Environmental and Power Warnings)
and DPO (Delayed Power Off) events for the PowerNV platform. These events
are generated on FSP (Flexible Service Processor) based systems. EPOW
events are generated due to various critical system conditions that
require system shutdown. A few examples of these conditions are high
ambient temperature or system running on UPS power with low UPS battery.
DPO event is generated in response to admin initiated system shutdown
request. Upon receipt of EPOW and DPO events the host kernel invokes
orderly_poweroff() for performing graceful system shutdown.
Signed-off-by: NVipin K Parashar <vipin@linux.vnet.ibm.com>
Acked-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3b476aad

13 7月, 2015 7 次提交

powerpc/powernv: Unfreeze VF PE on releasing it · f951e510

由 Gavin Shan 提交于 6月 23, 2015

When releasing PE for SRIOV VF, the PE is forced to be frozen
wrongly. When the same PE is picked for another VF, it won't
work anyhow. The patch fixes the issue by unfreezing, not
freezing the VF PE when releasing it.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f951e510

powerpc/powernv: Include VF PE in PELTV of PF PE · 283e2d8a

由 Gavin Shan 提交于 6月 22, 2015

The PELTV of PF PE should include VF PE, which is missed by current
code, so that the VF PE is frozen automatically when freezing PF PE.
The patch fixes the PELTV of PF PE to include VF PE.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

283e2d8a

powerpc/powernv: Pick M64 PEs based on BARs · 26ba248d

由 Gavin Shan 提交于 6月 19, 2015

On PHB3, PE might be reserved in advance to reflect the M64 segments
consumed by the PE according to M64 BARs (exclude VF BARs) of the PCI
devices included in the PE. The PE is picked based on M64 BARs instead
of the bridge's M64 windows, which might include VF BARs. Otherwise,
wrong PE could be picked.

The patch calculates the used M64 segments and PE numbers according to
the M64 BARs, excluding VF BARs, of PCI devices in one particular PE,
instead of the bridge's M64 windows. Then the right PE number is picked.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

26ba248d

powerpc/powernv: Boolean argument for pnv_ioda_setup_bus_PE() · d1203852

由 Gavin Shan 提交于 6月 19, 2015

The patch changes the type of last argument of pnv_ioda_setup_bus_PE()
and phb::pick_m64_pe() to boolean. No functional change.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d1203852

powerpc/powernv: Reserve M64 PEs based on BARs · 96a2f92b

由 Gavin Shan 提交于 6月 19, 2015

On PHB3, some PEs might be reserved in advance to reflect the M64
segments consumed by those PEs. We're reserving PEs based on the
M64 window of root port, which might contain VF BAR. The PEs for
VFs are allocated dynamically, not reserved based on the consumed
M64 segments. So the M64 window of root port isn't reliable for
the task. Instead, we go through M64 BARs (VF BARs excluded) of
PCI devices under the specified root bus and reserve PEs accordingly,
as the patch does.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

96a2f92b

powerpc/powernv: Allow to reserve one PE for multiple times · e9dc4d7f

由 Gavin Shan 提交于 6月 19, 2015

The PE numbers are reserved according to root port's M64 window,
which is aligned to M64 segment finely. So one PE shouldn't be
reserved for multiple times. We will reserve PE numbers according
to the M64 BARs of PCI device in subsequent patches, which aren't
aligned to M64 segment size finely. It means one particular PE
could be reserved for multiple times.

The patch allows one PE to be reserved for multiple times and we
print the warning message at debugging level.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e9dc4d7f

powerpc/iommu: Cleanup setting of DMA base/offset · e91c2511

由 Benjamin Herrenschmidt 提交于 6月 24, 2015

Now that the table and the offset can co-exist, we no longer need
to flip/flop, we can just establish both once at boot time.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e91c2511

06 7月, 2015 2 次提交

powerpc/powernv: Fix opal-elog interrupt handler · a8956a7b

由 Alistair Popple 提交于 7月 03, 2015

The conversion of opal events to a proper irqchip means that handlers
are called until the relevant opal event has been cleared by
processing it. Events that queue work should therefore use a threaded
handler to mask the event until processing is complete.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a8956a7b

powerpc/powernv: Fix vma page prot flags in opal-prd driver · d8ea782b

由 Vaidyanathan Srinivasan 提交于 6月 29, 2015

opal-prd driver will mmap() firmware code/data area as private
mapping to prd user space daemon.  Write to this page will
trigger COW faults.  The new COW pages are normal kernel RAM
pages accounted by the kernel and are not special.

vma->vm_page_prot value will be used at page fault time
for the new COW pages, while pgprot_t value passed in
remap_pfn_range() is used for the initial page table entry.

Hence:
* Do not add _PAGE_SPECIAL in vma, but only for remap_pfn_range()
* Also remap_pfn_range() will add the _PAGE_SPECIAL flag using
  pte_mkspecial() call, hence no need to specify in the driver

This fix resolves the page accounting warning shown below:
BUG: Bad rss-counter state mm:c0000007d34ac600 idx:1 val:19

The above warning is triggered since _PAGE_SPECIAL was incorrectly
being set for the normal kernel COW pages.
Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: NJeremy Kerr <jk@ozlabs.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d8ea782b

24 6月, 2015 1 次提交
- A
  make simple_positive() public · dc3f4198
  由 Al Viro 提交于 5月 18, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  dc3f4198
19 6月, 2015 1 次提交

powerpc/powernv: Fix wrong IOMMU table in pnv_ioda_setup_bus_dma() · 5c89a87d

由 Alexey Kardashevskiy 提交于 6月 18, 2015

When pnv_pci_ioda_fixup() is called during PHB fixup time, each PE in
the sorted list of PEs (phb::pe_dma_list) is iterated to setup the PE's
DMA32 space by pnv_ioda_setup_bus_dma() if the PE's DMA32 weight is bigger
than zero. The function also assigns all the subordinate PCI devices of
the PE's primary bus with the PE's DMA32 IOMMU table. It causes the PCI
devicess in the child PEs, which don't have DMA weight, receives wrong
IOMMU table and then IOMMU group.

The patch fixes above issue by more check on the PE's coverage and don't
assign IOMMU table to those PCI devices, which belong to the child PEs.
The problem was found on Firestone platform initially.
Suggested-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5c89a87d

18 6月, 2015 3 次提交

powerpc/iommu/ioda2: Enable compile with IOV=on and IOMMU_API=off · b5926430

由 Alexey Kardashevskiy 提交于 6月 15, 2015

The pnv_pci_ioda2_unset_window() function is used to do the final
cleanup of a DMA window being released:
- via VFIO ioctl by the guest request;
- via unplugging a virtual PCI function.
However the function was under #ifdef CONFIG_IOMMU_API and was missing.

This moves the helper outside of IOMMU_API block and enables it
for either or both IOMMU_API and PCI_IOV.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b5926430

powerpc/powernv: fix construction of opal PRD messages · 7185795a

由 Jeremy Kerr 提交于 6月 15, 2015

We currently have a bug in the PRD code, where the contents of an
incoming message (beyond the header) will be overwritten by the list
item manipulations when adding to to the prd_msg_queue.

This change reorders struct opal_prd_msg_queue_item, so that the
message body doesn't overlap the list_head.

We also clarify the memcpy of the message, as we're copying unnecessary
bytes at the end of the message data.
Signed-off-by: NJeremy Kerr <jk@ozlabs.org>
Acked-by: NStewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7185795a

powerpc/powernv: Increase opal-irqchip initcall priority · 02b6505c

由 Alistair Popple 提交于 6月 17, 2015

The eeh subsystem for powernv requires the opal event irqchip to be
initialised prior to initialisation or the following errors are
produced (and eeh doesn't work as expected):

irq: XICS didn't like hwirq-0x9 to VIRQ17 mapping (rc=-22)
pnv_eeh_post_init: Can't request OPAL event interrupt (0)

On powernv eeh is initialised from a subsys_initcall due to a check
for machine_is(powernv) in eeh_init(). This patch increases the
initcall priority of opal_event_init() to an arch_initcall to ensure
the opal event interface is initialised prior to any users of it.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Reported-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

02b6505c

17 6月, 2015 2 次提交

powerpc: don't use module_init in non-modular 83xx suspend code · a390a2f1

由 Paul Gortmaker 提交于 5月 01, 2015

The suspend.o is built for SUSPEND -- which is bool, and hence
this code is either present or absent.  It will never be modular,
so using module_init as an alias for __initcall can be somewhat
misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of device_initcall
directly in this change means that the runtime impact is
zero -- it will remain at level 6 in initcall ordering.

Cc: Scott Wood <scottwood@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

a390a2f1

powerpc: use device_initcall for registering rtc devices · 8f6b9512

由 Paul Gortmaker 提交于 5月 01, 2015

Currently these two RTC devices are in core platform code
where it is not possible for them to be modular.  It will
never be modular, so using module_init as an alias for
__initcall can be somewhat misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of device_initcall
directly in this change means that the runtime impact is
zero -- they will remain at level 6 in initcall ordering.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Geoff Levand <geoff@infradead.org>
Acked-by: NGeoff Levand <geoff@infradead.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

8f6b9512

15 6月, 2015 1 次提交

powerpc/powernv: pnv_init_idle_states() should only run on powernv · 4bece972

由 Michael Ellerman 提交于 6月 15, 2015

Although this init call checks for device tree properties before doing
anything, it should still only run on powernv machines.
Reviewed-by: NShreyas B Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4bece972

11 6月, 2015 9 次提交

vfio: powerpc/spapr: powerpc/powernv/ioda2: Use DMA windows API in ownership control · 46d3e1e1

由 Alexey Kardashevskiy 提交于 6月 05, 2015

Before the IOMMU user (VFIO) would take control over the IOMMU table
belonging to a specific IOMMU group. This approach did not allow sharing
tables between IOMMU groups attached to the same container.

This introduces a new IOMMU ownership flavour when the user can not
just control the existing IOMMU table but remove/create tables on demand.
If an IOMMU implements take/release_ownership() callbacks, this lets
the user have full control over the IOMMU group. When the ownership
is taken, the platform code removes all the windows so the caller must
create them.
Before returning the ownership back to the platform code, VFIO
unprograms and removes all the tables it created.

This changes IODA2's onwership handler to remove the existing table
rather than manipulating with the existing one. From now on,
iommu_take_ownership() and iommu_release_ownership() are only called
from the vfio_iommu_spapr_tce driver.

Old-style ownership is still supported allowing VFIO to run on older
P5IOC2 and IODA IO controllers.

No change in userspace-visible behaviour is expected. Since it recreates
TCE tables on each ownership change, related kernel traces will appear
more often.

This adds a pnv_pci_ioda2_setup_default_config() which is called
when PE is being configured at boot time and when the ownership is
passed from VFIO to the platform code.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
[aw: for the vfio related changes]
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

46d3e1e1

powerpc/iommu/ioda2: Add get_table_size() to calculate the size of future table · 00547193

由 Alexey Kardashevskiy 提交于 6月 05, 2015

This adds a way for the IOMMU user to know how much a new table will
use so it can be accounted in the locked_vm limit before allocation
happens.

This stores the allocated table size in pnv_pci_ioda2_get_table_size()
so the locked_vm counter can be updated correctly when a table is
being disposed.

This defines an iommu_table_group_ops callback to let VFIO know
how much memory will be locked if a table is created.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

00547193

powerpc/powernv/ioda2: Use new helpers to do proper cleanup on PE release · c035e37b

由 Alexey Kardashevskiy 提交于 6月 05, 2015

The existing code programmed TVT#0 with some address and then
immediately released that memory.

This makes use of pnv_pci_ioda2_unset_window() and
pnv_pci_ioda2_set_bypass() which do correct resource release and
TVT update.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c035e37b

vfio: powerpc/spapr: powerpc/powernv/ioda: Define and implement DMA windows API · 4793d65d

由 Alexey Kardashevskiy 提交于 6月 05, 2015

This extends iommu_table_group_ops by a set of callbacks to support
dynamic DMA windows management.

create_table() creates a TCE table with specific parameters.
it receives iommu_table_group to know nodeid in order to allocate
TCE table memory closer to the PHB. The exact format of allocated
multi-level table might be also specific to the PHB model (not
the case now though).
This callback calculated the DMA window offset on a PCI bus from @num
and stores it in a just created table.

set_window() sets the window at specified TVT index + @num on PHB.

unset_window() unsets the window from specified TVT.

This adds a free() callback to iommu_table_ops to free the memory
(potentially a tree of tables) allocated for the TCE table.

create_table() and free() are supposed to be called once per
VFIO container and set_window()/unset_window() are supposed to be
called for every group in a container.

This adds IOMMU capabilities to iommu_table_group such as default
32bit window parameters and others. This makes use of new values in
vfio_iommu_spapr_tce. IODA1/P5IOC2 do not support DDW so they do not
advertise pagemasks to the userspace.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4793d65d

powerpc/powernv: Implement multilevel TCE tables · bbb845c4

由 Alexey Kardashevskiy 提交于 6月 05, 2015

TCE tables might get too big in case of 4K IOMMU pages and DDW enabled
on huge guests (hundreds of GB of RAM) so the kernel might be unable to
allocate contiguous chunk of physical memory to store the TCE table.

To address this, POWER8 CPU (actually, IODA2) supports multi-level
TCE tables, up to 5 levels which splits the table into a tree of
smaller subtables.

This adds multi-level TCE tables support to
pnv_pci_ioda2_table_alloc_pages() and pnv_pci_ioda2_table_free_pages()
helpers.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bbb845c4

powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_set_window · 43cb60ab

由 Alexey Kardashevskiy 提交于 6月 05, 2015

This is a part of moving DMA window programming to an iommu_ops
callback. pnv_pci_ioda2_set_window() takes an iommu_table_group as
a first parameter (not pnv_ioda_pe) as it is going to be used as
a callback for VFIO DDW code.

This should cause no behavioural change.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

43cb60ab

powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages · aca6913f

由 Alexey Kardashevskiy 提交于 6月 05, 2015

This is a part of moving TCE table allocation into an iommu_ops
callback to support multiple IOMMU groups per one VFIO container.

This moves the code which allocates the actual TCE tables to helpers:
pnv_pci_ioda2_table_alloc_pages() and pnv_pci_ioda2_table_free_pages().
These do not allocate/free the iommu_table struct.

This enforces window size to be a power of two.

This should cause no behavioural change.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

aca6913f

powerpc/powernv/ioda2: Rework iommu_table creation · e5aad1e6

由 Alexey Kardashevskiy 提交于 6月 05, 2015

This moves iommu_table creation to the beginning to make following changes
easier to review. This starts using table parameters from the iommu_table
struct.

This should cause no behavioural change.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e5aad1e6

powerpc/iommu/powernv: Release replaced TCE · 05c6cfb9

由 Alexey Kardashevskiy 提交于 6月 05, 2015

At the moment writing new TCE value to the IOMMU table fails with EBUSY
if there is a valid entry already. However PAPR specification allows
the guest to write new TCE value without clearing it first.

Another problem this patch is addressing is the use of pool locks for
external IOMMU users such as VFIO. The pool locks are to protect
DMA page allocator rather than entries and since the host kernel does
not control what pages are in use, there is no point in pool locks and
exchange()+put_page(oldtce) is sufficient to avoid possible races.

This adds an exchange() callback to iommu_table_ops which does the same
thing as set() plus it returns replaced TCE and DMA direction so
the caller can release the pages afterwards. The exchange() receives
a physical address unlike set() which receives linear mapping address;
and returns a physical address as the clear() does.

This implements exchange() for P5IOC2/IODA/IODA2. This adds a requirement
for a platform to have exchange() implemented in order to support VFIO.

This replaces iommu_tce_build() and iommu_clear_tce() with
a single iommu_tce_xchg().

This makes sure that TCE permission bits are not set in TCE passed to
IOMMU API as those are to be calculated by platform code from
DMA direction.

This moves SetPageDirty() to the IOMMU code to make it work for both
VFIO ioctl interface in in-kernel TCE acceleration (when it becomes
available later).
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
[aw: for the vfio related changes]
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

05c6cfb9

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功