提交 · 73d55c3b59f7d9cadc1dbc07d75ccee6c81fdf5b · openeuler / raspberrypi-kernel

09 3月, 2016 23 次提交

cxl: IRQ allocation for guests · 73d55c3b

由 Frederic Barrat 提交于 3月 04, 2016

The PSL interrupt cannot be multiplexed in a guest, as it is not
supported by the hypervisor. So an interrupt will be allocated
for it for each context. It will still be the first interrupt found in
the first interrupt range, but is treated almost like any other AFU
interrupt when creating/deleting the context. Only the handler is
different. Rework the code so that the range 0 is treated like the
other ranges.
Co-authored-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

73d55c3b

cxl: Update cxl_irq() prototype · 6d625ed9

由 Frederic Barrat 提交于 3月 04, 2016

The context parameter when calling cxl_irq() should be strongly typed.
Co-authored-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6d625ed9

cxl: Isolate a few bare-metal-specific calls · ea2d1f95

由 Frederic Barrat 提交于 3月 04, 2016

A few functions are mostly common between bare-metal and guest and
just need minor tuning. To avoid crowding the backend API, introduce a
few 'if' based on the CPU being in HV mode.
Co-authored-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ea2d1f95

cxl: Rename some bare-metal specific functions · 2b04cf31

由 Frederic Barrat 提交于 3月 04, 2016

Rename a few functions, changing the 'cxl_' prefix to either
'cxl_pci_' or 'cxl_native_', to make clear that the implementation is
bare-metal specific.

Those functions will have an equivalent implementation for a guest in
a later patch.
Co-authored-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2b04cf31

cxl: Introduce implementation-specific API · 5be587b1

由 Frederic Barrat 提交于 3月 04, 2016

The backend API (in cxl.h) lists some low-level functions whose
implementation is different on bare-metal and in a guest. Each
environment implements its own functions, and the common code uses
them through function pointers, defined in cxl_backend_ops
Co-authored-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5be587b1

cxl: Define process problem state area at attach time only · cca44c01

由 Frederic Barrat 提交于 3月 04, 2016

CXL kernel API was defining the process problem state area during
context initialization, making it possible to map the problem state
area before attaching the context. This won't work on a powerVM
guest. So force the logical behavior, like in userspace: attach first,
then map the problem state area.
Remove calls to cxl_assign_psn_space during init. The function is
already called on the attach paths.
Co-authored-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

cca44c01

cxl: Move bare-metal specific code to specialized files · d56d301b

由 Frederic Barrat 提交于 3月 04, 2016

Move a few functions around to better separate code specific to
bare-metal environment from code which will be commonly used between
guest and bare-metal.

Code specific to bare-metal is meant to be in native.c or pci.c
only. It's basically anything which touches the card p1 registers,
some p2 registers not needed from a guest and the PCI interface.
Co-authored-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d56d301b

cxl: Move common code away from bare-metal-specific files · 86331862

由 Christophe Lombard 提交于 3月 04, 2016

Move around some functions which will be accessed from the bare-metal
and guest environments.
Code in native.c and pci.c is meant to be bare-metal specific.
Other files contain code which may be shared with guests.
Co-authored-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: NChristophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Acked-by: NIan Munsie <imunsie@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

86331862

powerpc/eeh: eeh_pci_enable(): fix checking of post-request state · 949e9b82

由 Andrew Donnellan 提交于 10月 23, 2015

In eeh_pci_enable(), after making the request to set the new options, we
call eeh_ops->wait_state() to check that the request finished successfully.

At the moment, if eeh_ops->wait_state() returns 0, we return 0 without
checking that it reflects the expected outcome. This can lead to callers
further up the chain incorrectly assuming the slot has been successfully
unfrozen and continuing to attempt recovery.

On powernv, this will occur if pnv_eeh_get_pe_state() or
pnv_eeh_get_phb_state() return 0, which in turn occurs if the relevant OPAL
call returns OPAL_EEH_STOPPED_MMIO_DMA_FREEZE or
OPAL_EEH_PHB_ERROR respectively.

On pseries, this will occur if pseries_eeh_get_state() returns 0, which in
turn occurs if RTAS reports that the PE is in the MMIO Stopped and DMA
Stopped states.

Obviously, none of these cases represent a successful completion of a
request to thaw MMIO or DMA.

Fix the check so that a wait_state() return value of 0 won't be considered
successful for the EEH_OPT_THAW_MMIO or EEH_OPT_THAW_DMA cases.
Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

949e9b82

powerpc/eeh: Remove duplicated check in eeh_dump_pe_log() · b6c7347f

由 Gavin Shan 提交于 2月 26, 2016

When eeh_dump_pe_log() is only called by eeh_slot_error_detail(),
we already have the check that the PE isn't in PCI config blocked
state in eeh_slot_error_detail(). So we needn't the duplicated
check in eeh_dump_pe_log().

This removes the duplicated check in eeh_dump_pe_log(). No logical
changes introduced.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b6c7347f

powerpc/eeh: Synchronize recovery in host/guest · eca036ee

由 Gavin Shan 提交于 3月 04, 2016

When passing through SRIOV VFs to guest, we possibly encounter EEH
error on PF. In this case, the VF PEs are put into frozen state.
The error could be reported to guest before it's captured by the
host. That means the guest could attempt to recover errors on VFs
before host gets chance to recover errors on PFs. The VFs won't be
recovered successfully.

This enforces the recovery order for above case: the recovery on
child PE in guest is hold until the recovery on parent PE in host
is completed.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: NRussell Currey <ruscur@russell.cc>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

eca036ee

powerpc/eeh: Don't remove passed VFs · 3fa7bf72

由 Gavin Shan 提交于 3月 04, 2016

When we have partial hotplug as part of the error recovery on PF,
the VFs that are bound with vfio-pci driver will experience hotplug.
That's not allowed.

This checks if the VF PE is passed or not. If it does, we leave
the VF without removing it.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: NRussell Currey <ruscur@russell.cc>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3fa7bf72

powerpc/eeh: Don't propagate error to guest · 2311cca5

由 Gavin Shan 提交于 3月 04, 2016

When EEH error happened to the parent PE of those PEs that have
been passed through to guest, the error is propagated to guest
domain and the VFIO driver's error handlers are called. It's not
correct as the error in the host domain shouldn't be propagated
to guests and affect them.

This adds one more limitation when calling EEH error handlers.
If the PE has been passed through to guest, the error handlers
won't be called.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: NRussell Currey <ruscur@russell.cc>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2311cca5

powerpc/eeh: powerpc/eeh: Support error recovery for VF PE · 67086e32

由 Wei Yang 提交于 3月 04, 2016

PFs are enumerated on PCI bus, while VFs are created by PF's driver.

In EEH recovery, it has two cases:
1. Device and driver is EEH aware, error handlers are called.
2. Device and driver is not EEH aware, un-plug the device and plug it again
by enumerating it.

The special thing happens on the second case. For a PF, we could use the
original pci core to enumerate the bus, while for VF we need to record the
VFs which aer un-plugged then plug it again.

Also The patch caches the VF index in pci_dn, which can be used to
calculate VF's bus, device and function number. Those information helps to
locate the VF's PCI device instance when doing hotplug during EEH recovery
if necessary.
Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

67086e32

powerpc/powernv: Support PCI config restore for VFs · 0dc2830e

由 Wei Yang 提交于 3月 04, 2016

After PE reset, OPAL API opal_pci_reinit() is called on all devices
contained in the PE to reinitialize them. While skiboot is not aware of
VFs, we have to implement the function in kernel to reinitialize VFs after
reset on PE for VFs.

In this patch, two functions pnv_pci_fixup_vf_mps() and
pnv_eeh_restore_vf_config() both manipulate the MPS of the VF, since for a
VF it has three cases.

1. Normal creation for a VF
   In this case, pnv_pci_fixup_vf_mps() is called to make the MPS a proper
   value compared with its parent.
2. EEH recovery without VF removed
   In this case, MPS is stored in pci_dn and pnv_eeh_restore_vf_config() is
   called to restore it and reinitialize other part.
3. EEH recovery with VF removed
   In this case, VF will be removed then re-created. Both functions are
   called. First pnv_pci_fixup_vf_mps() is called to store the proper MPS
   to pci_dn and then pnv_eeh_restore_vf_config() is called to do proper
   thing.

This introduces two functions: pnv_pci_fixup_vf_mps() to fixup the VF's
MPS to make sure it is equal to parent's and store this value in pci_dn
for future use. pnv_eeh_restore_vf_config() to re-initialize on VF by
restoring MPS, disabling completion timeout, enabling SERR, etc.
Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0dc2830e

powerpc/powernv: Support EEH reset for VF PE · 9312bc5b

由 Wei Yang 提交于 3月 04, 2016

PEs for VFs don't have primary bus. So they have to have their own reset
backend, which is used during EEH recovery. The patch implements the reset
backend for VF's PE by issuing FLR or AF FLR to the VFs, which are contained
in the PE.
Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9312bc5b

powerpc/eeh: Create PE for VFs · c29fa27d

由 Wei Yang 提交于 3月 04, 2016

This creates PEs for VFs in the weak function pcibios_bus_add_device().
Those PEs for VFs are identified with newly introduced flag EEH_PE_VF
so that we treat them differently during EEH recovery.
Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c29fa27d

powerpc/eeh: EEH device for VF · 39218cd0

由 Wei Yang 提交于 3月 04, 2016

VFs and their corresponding pdn are created and released dynamically
when their PF's SRIOV capability is enabled and disabled. This creates
and releases EEH devices for VFs when creating and releasing their pdn
instances, which means EEH devices and pdn instances have same life
cycle. Also, VF's EEH device is identified by (struct eeh_dev::physfn).
Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

39218cd0

powerpc/eeh: Cache normal BARs, not windows or IOV BARs · 51c0e87e

由 Wei Yang 提交于 3月 04, 2016

This restricts the EEH address cache to use only the first 7 BARs. This
makes __eeh_addr_cache_insert_dev() ignore PCI bridge window and IOV BARs.
As the result of this change, eeh_addr_cache_get_dev() will return VFs from
VF's resource addresses instead of parent PFs.

This also removes PCI bridge check as we limit __eeh_addr_cache_insert_dev()
to 7 BARs and this effectively excludes PCI bridges from being cached.
Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

51c0e87e

powerpc/pci: Remove VFs prior to PF · 971427f5

由 Wei Yang 提交于 3月 04, 2016

As commit ac205b7b ("PCI: make sriov work with hotplug remove")
indicates, VFs which is on the same PCI bus as their PF, should be
removed before the PF. Otherwise, we might run into kernel crash
at PCI unplugging time.

This applies the above pattern to powerpc PCI hotplug path.
Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

971427f5

PCI: Add pcibios_bus_add_device() weak function · 7b77061f

由 Wei Yang 提交于 3月 04, 2016

This adds weak function pcibios_bus_add_device() for arch dependent
code could do proper setup. For example, powerpc could setup EEH
related resources for SRIOV VFs.
Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7b77061f

PCI/IOV: Rename and export virtfn_{add, remove} · c194f7ea

由 Wei Yang 提交于 3月 04, 2016

During EEH recovery, hotplug is applied to the devices which don't
have drivers or their drivers don't support EEH. However, the hotplug,
which was implemented based on PCI bus, can't be applied to VF directly.
Instead, we unplug and plug individual PCI devices (VFs).

This renames virtn_{add,remove}() and exports them so they can be used
in PCI hotplug during EEH recovery.
Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c194f7ea

powerpc/eeh: Reworked eeh_pe_bus_get() · 4eb0799f

由 Gavin Shan 提交于 2月 09, 2016

The original implementation is ugly: unnecessary if statements and
"out" tag. This reworks the function to avoid above weaknesses. No
functional changes introduced.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4eb0799f

03 3月, 2016 8 次提交

powerpc/mm: Move hash64 tlbflush code into a new header · ee3b93eb

由 Aneesh Kumar K.V 提交于 3月 01, 2016

No code changes.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ee3b93eb

powerpc/mm: Move hash related mmu-*.h headers to book3s/ · f64e8084

由 Aneesh Kumar K.V 提交于 3月 01, 2016

No code changes.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f64e8084

powerpc/mm: add _PAGE_HASHPTE similar to 4K hash · c367a441

由 Aneesh Kumar K.V 提交于 3月 01, 2016

We don't need to update linux page table entry with _PAGE_HASHPTE early
in hash pte fault. A parallel pte update will loop via _PAGE_BUSY
and look at _PAGE_HASHPTE for a required hpte flush only if
_PAGE_BUSY is cleared. That ensures a pte update will wait for a
parallel hpte insert to finish before looking at _PAGE_HASHPTE bit.

To avoid further confusion drop setting _PAGE_HASHPTE in cmpxchg in __hash_page_4K.

commit 41743a4e ("powerpc: Free a PTE bit on ppc64 with 64K pages")
did similar change for 64K config
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c367a441

powerp/mm: Update code comments · e9a68147

由 Aneesh Kumar K.V 提交于 3月 01, 2016

We are updating pte in those functions.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e9a68147

mm: Some arch may want to use HPAGE_PMD related values as variables · ff20c2e0

由 Kirill A. Shutemov 提交于 3月 01, 2016

With next generation power processor, we are having a new mmu model
[1] that require us to maintain a different linux page table format.

Inorder to support both current and future ppc64 systems with a single
kernel we need to make sure kernel can select between different page
table format at runtime. With the new MMU (radix MMU) added, we will
have two different pmd hugepage size 16MB for hash model and 2MB for
Radix model. Hence make HPAGE_PMD related values as a variable.

Actual conversion of HPAGE_PMD to a variable for ppc64 happens in a
followup patch.

[1] http://ibm.biz/power-isa3 (Needs registration).
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ff20c2e0

powerpc/mm: Switch book3s 64 with 64K page size to 4 level page table · 368ced78

由 Aneesh Kumar K.V 提交于 3月 01, 2016

This is needed so that we can support both hash and radix page table
using single kernel. Radix kernel uses a 4 level table.

We now use physical address in upper page table tree levels. Even though
they are aligned to their size, for the masked bits we use the
bit positions as per PowerISA 3.0.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

368ced78

powerpc/mm: Don't have conditional defines for real_pte_t · ae9a71af

由 Aneesh Kumar K.V 提交于 3月 01, 2016

We remove real_pte_t out of STRICT_MM_TYPESCHECK.
Reviewed-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ae9a71af

powerpc/mm: Split pgtable types to separate header · 2bf59916

由 Aneesh Kumar K.V 提交于 3月 01, 2016

We move the page table accessors into a separate header. We will
later add a big endian variant of the table which is needed for radix.
No functionality change only code movement.
Reviewed-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2bf59916

02 3月, 2016 9 次提交

powerpc: Add the ability to save VSX without giving it up · bf6a4d5b

由 Cyril Bur 提交于 2月 29, 2016

This patch adds the ability to be able to save the VSX registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.

This patch builds on a previous optimisation for the FPU and VEC registers
in the thread copy path to avoid a possibly pointless reload of VSX state.
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bf6a4d5b

powerpc: Add the ability to save Altivec without giving it up · 6f515d84

由 Cyril Bur 提交于 2月 29, 2016

This patch adds the ability to be able to save the VEC registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.

This patch builds on a previous optimisation for the FPU registers in the
thread copy path to avoid a possibly pointless reload of VEC state.
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

6f515d84

powerpc: Add the ability to save FPU without giving it up · 8792468d

由 Cyril Bur 提交于 2月 29, 2016

This patch adds the ability to be able to save the FPU registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.

This patch optimises the thread copy path (as a result of a fork() or
clone()) so that the parent thread can return to userspace with hot
registers avoiding a possibly pointless reload of FPU register state.
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8792468d

powerpc: Prepare for splitting giveup_{fpu, altivec, vsx} in two · de2a20aa

由 Cyril Bur 提交于 2月 29, 2016

This prepares for the decoupling of saving {fpu,altivec,vsx} registers and
marking {fpu,altivec,vsx} as being unused by a thread.

Currently giveup_{fpu,altivec,vsx}() does both however optimisations to
task switching can be made if these two operations are decoupled.
save_all() will permit the saving of registers to thread structs and leave
threads MSR with bits enabled.

This patch introduces no functional change.
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

de2a20aa

powerpc: Restore FPU/VEC/VSX if previously used · 70fe3d98

由 Cyril Bur 提交于 2月 29, 2016

Currently the FPU, VEC and VSX facilities are lazily loaded. This is not
a problem unless a process is using these facilities.

Modern versions of GCC are very good at automatically vectorising code,
new and modernised workloads make use of floating point and vector
facilities, even the kernel makes use of vectorised memcpy.

All this combined greatly increases the cost of a syscall since the
kernel uses the facilities sometimes even in syscall fast-path making it
increasingly common for a thread to take an *_unavailable exception soon
after a syscall, not to mention potentially taking all three.

The obvious overcompensation to this problem is to simply always load
all the facilities on every exit to userspace. Loading up all FPU, VEC
and VSX registers every time can be expensive and if a workload does
avoid using them, it should not be forced to incur this penalty.

An 8bit counter is used to detect if the registers have been used in the
past and the registers are always loaded until the value wraps to back
to zero.

Several versions of the assembly in entry_64.S were tested:

  1. Always calling C.
  2. Performing a common case check and then calling C.
  3. A complex check in asm.

After some benchmarking it was determined that avoiding C in the common
case is a performance benefit (option 2). The full check in asm (option
3) greatly complicated that codepath for a negligible performance gain
and the trade-off was deemed not worth it.
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
[mpe: Move load_vec in the struct to fill an existing hole, reword change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

fixup

70fe3d98

powerpc: Explicitly disable math features when copying thread · d272f667

由 Cyril Bur 提交于 2月 29, 2016

Currently when threads get scheduled off they always giveup the FPU,
Altivec (VMX) and Vector (VSX) units if they were using them. When they are
scheduled back on a fault is then taken to enable each facility and load
registers. As a result explicitly disabling FPU/VMX/VSX has not been
necessary.

Future changes and optimisations remove this mandatory giveup and fault
which could cause calls such as clone() and fork() to copy threads and run
them later with FPU/VMX/VSX enabled but no registers loaded.

This patch starts the process of having MSR_{FP,VEC,VSX} mean that a
threads registers are hot while not having MSR_{FP,VEC,VSX} means that the
registers must be loaded. This allows for a smarter return to userspace.
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d272f667

selftests/powerpc: Test FPU and VMX regs in signal ucontext · 48e8c571

由 Cyril Bur 提交于 2月 29, 2016

Load up the non volatile FPU and VMX regs and ensure that they are the
expected value in a signal handler
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

48e8c571

selftests/powerpc: Test preservation of FPU and VMX regs across preemption · e5ab8be6

由 Cyril Bur 提交于 2月 29, 2016

Loop in assembly checking the registers with many threads.
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e5ab8be6

selftests/powerpc: Test the preservation of FPU and VMX regs across syscall · 01127f1e

由 Cyril Bur 提交于 2月 29, 2016

Test that the non volatile floating point and Altivec registers get
correctly preserved across the fork() syscall.

fork() works nicely for this purpose, the registers should be the same for
both parent and child
Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
[mpe: Add include guards to basic_asm.h, minor formatting]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

01127f1e