提交 · 129fd4233b62159d50c35fb6489cb22ee9c27415 · openeuler / Kernel

22 8月, 2015 2 次提交

KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig · 129fd423

由 Thomas Huth 提交于 5月 22, 2015

Since the PPC970 support has been removed from the kvm-hv kernel
module recently, we should also reflect this change in the help
text of the corresponding Kconfig option.
Signed-off-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

129fd423

KVM: PPC: fix suspicious use of conditional operator · f5ffe330

由 Tudor Laurentiu 提交于 5月 25, 2015

This was signaled by a static code analysis tool.
Signed-off-by: NLaurentiu Tudor <Laurentiu.Tudor@freescale.com>
Reviewed-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

f5ffe330

08 7月, 2015 1 次提交

powerpc/perf/24x7: Fix lockdep warning · 442053e5

由 Sukadev Bhattiprolu 提交于 7月 07, 2015

The sysfs attributes for the 24x7 counters are dynamically allocated.
Initialize the attributes using sysfs_attr_init() to fix following
warning which occurs when CONFIG_DEBUG_LOCK_VMALLOC=y.

[    0.346249] audit: initializing netlink subsys (disabled)
[    0.346284] audit: type=2000 audit(1436295254.340:1): initialized
[    0.346489] BUG: key c0000000efe90198 not in .data!
[    0.346491] DEBUG_LOCKS_WARN_ON(1)
[    0.346502] ------------[ cut here ]------------
[    0.346504] WARNING: at ../kernel/locking/lockdep.c:3002
[    0.346506] Modules linked in:
Reported-by: NGustavo Luiz Duarte <gustavold@linux.vnet.ibm.com>
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Tested-by: NGustavo Luiz Duarte <gustavold@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

442053e5

07 7月, 2015 1 次提交

powerpc/powernv: Fix race in updating core_idle_state · b32aadc1

由 Shreyas B. Prabhu 提交于 7月 07, 2015

core_idle_state is maintained for each core. It uses 0-7 bits to track
whether a thread in the core has entered fastsleep or winkle. 8th bit is
used as a lock bit.
The lock bit is set in these 2 scenarios-
 - The thread is first in subcore to wakeup from sleep/winkle.
 - If its the last thread in the core about to enter sleep/winkle

While the lock bit is set, if any other thread in the core wakes up, it
loops until the lock bit is cleared before proceeding in the wakeup
path. This helps prevent race conditions w.r.t fastsleep workaround and
prevents threads from switching to process context before core/subcore
resources are restored.

But, in the path to sleep/winkle entry, we currently don't check for
lock-bit. This exposes us to following race when running with subcore
on-

First thread in the subcorea		Another thread in the same
waking up		   		core entering sleep/winkle

lwarx   r15,0,r14
ori     r15,r15,PNV_CORE_IDLE_LOCK_BIT
stwcx.  r15,0,r14
[Code to restore subcore state]

						lwarx   r15,0,r14
						[clear thread bit]
						stwcx.  r15,0,r14

andi.   r15,r15,PNV_CORE_IDLE_THREAD_BITS
stw     r15,0(r14)

Here, after the thread entering sleep clears its thread bit in
core_idle_state, the value is overwritten by the thread waking up.
In such cases when the core enters fastsleep, code mistakes an idle
thread as running. Because of this, the first thread waking up from
fastsleep which is supposed to resync timebase skips it. So we can
end up having a core with stale timebase value.

This patch fixes the above race by looping on the lock bit even while
entering the idle states.
Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Fixes: 7b54e9f213f76 'powernv/powerpc: Add winkle support for offline cpus'
Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b32aadc1

06 7月, 2015 5 次提交

powerpc/powernv: Fix opal-elog interrupt handler · a8956a7b

由 Alistair Popple 提交于 7月 03, 2015

The conversion of opal events to a proper irqchip means that handlers
are called until the relevant opal event has been cleared by
processing it. Events that queue work should therefore use a threaded
handler to mask the event until processing is complete.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a8956a7b

powerpc/ppc4xx_hsta_msi: Include ppc-pci.h to fix reference to hose_list · aaf6fd5c

由 Daniel Axtens 提交于 7月 06, 2015

An earlier commit referenced 'hose_list' in sysdev/ppc4xx_hsta_msi.c.
hose_list is defined in ppc-pci.h, which was not included in that
file. Include it, fixing the build for the akebono defconfig used
by the kbuild test robot.

Fixes: f2c800aa ("powerpc/ppc4xx_hsta_msi: Move MSI-related ops to pci_controller_ops")
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

aaf6fd5c

powerpc: Add plain English description for alignment exception oopses · eab861a7

由 Anton Blanchard 提交于 7月 02, 2015

If we take an alignment exception which we cannot fix, the oops
currently prints:

Unable to handle kernel paging request for unknown fault

Lets print something more useful:

Unable to handle kernel paging request for unaligned access at address 0xc0000000f77bba8f
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

eab861a7

powerpc: Set the correct kernel taint on machine check errors. · 27ea2c42

由 Daniel Axtens 提交于 6月 15, 2015

This means the 'M' flag will work properly when the kernel prints a backtrace.
Signed-off-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

27ea2c42

powerpc/powernv: Fix vma page prot flags in opal-prd driver · d8ea782b

由 Vaidyanathan Srinivasan 提交于 6月 29, 2015

opal-prd driver will mmap() firmware code/data area as private
mapping to prd user space daemon.  Write to this page will
trigger COW faults.  The new COW pages are normal kernel RAM
pages accounted by the kernel and are not special.

vma->vm_page_prot value will be used at page fault time
for the new COW pages, while pgprot_t value passed in
remap_pfn_range() is used for the initial page table entry.

Hence:
* Do not add _PAGE_SPECIAL in vma, but only for remap_pfn_range()
* Also remap_pfn_range() will add the _PAGE_SPECIAL flag using
  pte_mkspecial() call, hence no need to specify in the driver

This fix resolves the page accounting warning shown below:
BUG: Bad rss-counter state mm:c0000007d34ac600 idx:1 val:19

The above warning is triggered since _PAGE_SPECIAL was incorrectly
being set for the normal kernel COW pages.
Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: NJeremy Kerr <jk@ozlabs.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d8ea782b

26 6月, 2015 1 次提交

mm/hugetlb: remove arch_prepare/release_hugepage from arch headers · 08bd4fc1

由 Dominik Dingel 提交于 6月 25, 2015

Nobody used these hooks so they were removed from common code, and can now
be removed from the architectures.
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: NRalf Baechle <ralf@linux-mips.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

08bd4fc1

25 6月, 2015 8 次提交

mm: clarify that the function operates on hugepage pte · 8809aa2d

由 Aneesh Kumar K.V 提交于 6月 24, 2015

We have confusing functions to clear pmd, pmd_clear_* and pmd_clear.  Add
_huge_ to pmdp_clear functions so that we are clear that they operate on
hugepage pte.

We don't bother about other functions like pmdp_set_wrprotect,
pmdp_clear_flush_young, because they operate on PTE bits and hence
indicate they are operating on hugepage ptes
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8809aa2d

powerpc/mm: use generic version of pmdp_clear_flush() · f28b6ff8

由 Aneesh Kumar K.V 提交于 6月 24, 2015

Also move the pmd_trans_huge check to generic code.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f28b6ff8

mm/thp: split out pmd collapse flush into separate functions · 15a25b2e

由 Aneesh Kumar K.V 提交于 6月 24, 2015

Architectures like ppc64 [1] need to do special things while clearing pmd
before a collapse. For them this operation is largely different from a
normal hugepage pte clear. Hence add a separate function to clear pmd
before collapse. After this patch pmdp_* functions operate only on
hugepage pte, and not on regular pmd_t values pointing to page table.

[1] ppc64 needs to invalidate all the normal page pte mappings we already
have inserted in the hardware hash page table. But before doing that we
need to make sure there are no parallel hash page table insert going on.
So we need to do a kick_all_cpus_sync() before flushing the older hash
table entries. By moving this to a separate function we capture these
details and mention how it is different from a hugepage pte clear.

This patch is a cleanup and only does code movement for clarity. There
should not be any change in functionality.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

15a25b2e

mm/hugetlb: reduce arch dependent code about hugetlb_prefault_arch_hook · a67a31fa

由 Zhang Zhen 提交于 6月 24, 2015

Currently we have many duplicates in definitions of
hugetlb_prefault_arch_hook.  In all architectures this function is empty.
Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a67a31fa

powerpc/mm: tracking vDSO remap · 83d3f0e9

由 Laurent Dufour 提交于 6月 24, 2015

Some processes (CRIU) are moving the vDSO area using the mremap system
call.  As a consequence the kernel reference to the vDSO base address is
no more valid and the signal return frame built once the vDSO has been
moved is not pointing to the new sigreturn address.

This patch handles vDSO remapping and unmapping.
Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: NIngo Molnar <mingo@kernel.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

83d3f0e9

mm: new mm hook framework · 2ae416b1

由 Laurent Dufour 提交于 6月 24, 2015

CRIU is recreating the process memory layout by remapping the checkpointee
memory area on top of the current process (criu).  This includes remapping
the vDSO to the place it has at checkpoint time.

However some architectures like powerpc are keeping a reference to the
vDSO base address to build the signal return stack frame by calling the
vDSO sigreturn service.  So once the vDSO has been moved, this reference
is no more valid and the signal frame built later are not usable.

This patch serie is introducing a new mm hook framework, and a new
arch_remap hook which is called when mremap is done and the mm lock still
hold.  The next patch is adding the vDSO remap and unmap tracking to the
powerpc architecture.

This patch (of 3):

This patch introduces a new set of header file to manage mm hooks:
- per architecture empty header file (arch/x/include/asm/mm-arch-hooks.h)
- a generic header (include/linux/mm-arch-hooks.h)

The architecture which need to overwrite a hook as to redefine it in its
header file, while architecture which doesn't need have nothing to do.

The default hooks are defined in the generic header and are used in the
case the architecture is not defining it.

In a next step, mm hooks defined in include/asm-generic/mm_hooks.h should
be moved here.
Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2ae416b1

mm/hugetlb: reduce arch dependent code about huge_pmd_unshare · e81f2d22

由 Zhang Zhen 提交于 6月 24, 2015

Currently we have many duplicates in definitions of huge_pmd_unshare.  In
all architectures this function just returns 0 when
CONFIG_ARCH_WANT_HUGE_PMD_SHARE is N.

This patch puts the default implementation in mm/hugetlb.c and lets these
architectures use the common code.
Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Rientjes <rientjes@google.com>
Cc: James Yang <James.Yang@freescale.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e81f2d22

powerpc: use for_each_sg() · 5935877a

由 Akinobu Mita 提交于 6月 24, 2015

This replaces the plain loop over the sglist array with for_each_sg()
macro which consists of sg_next() function calls.  Since powerpc does
select ARCH_HAS_SG_CHAIN, it is necessary to use for_each_sg() in order
to loop over each sg element.  This also help find problems with drivers
that do not properly initialize their sg tables when CONFIG_DEBUG_SG is
enabled.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5935877a

24 6月, 2015 1 次提交
- A
  make simple_positive() public · dc3f4198
  由 Al Viro 提交于 5月 18, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  dc3f4198
19 6月, 2015 4 次提交

powerpc/powernv: Fix wrong IOMMU table in pnv_ioda_setup_bus_dma() · 5c89a87d

由 Alexey Kardashevskiy 提交于 6月 18, 2015

When pnv_pci_ioda_fixup() is called during PHB fixup time, each PE in
the sorted list of PEs (phb::pe_dma_list) is iterated to setup the PE's
DMA32 space by pnv_ioda_setup_bus_dma() if the PE's DMA32 weight is bigger
than zero. The function also assigns all the subordinate PCI devices of
the PE's primary bus with the PE's DMA32 IOMMU table. It causes the PCI
devicess in the child PEs, which don't have DMA weight, receives wrong
IOMMU table and then IOMMU group.

The patch fixes above issue by more check on the PE's coverage and don't
assign IOMMU table to those PCI devices, which belong to the child PEs.
The problem was found on Firestone platform initially.
Suggested-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5c89a87d

powerpc/mm: Change the swap encoding in pte. · e1483158

由 Aneesh Kumar K.V 提交于 6月 17, 2015

Current swap encoding in pte can't support large pfns
above 4TB. Change the swap encoding such that we put
the swap type in the PTE bits. Also add build checks
to make sure we don't overlap with HPTEFLAGS.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e1483158

powerpc/mm: PTE_RPN_MAX is not used, remove the same · 02505ceb

由 Aneesh Kumar K.V 提交于 6月 17, 2015

Remove the unused #define
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

02505ceb

powerpc/tm: Abort syscalls in active transactions · b4b56f9e

由 Sam bobroff 提交于 6月 12, 2015

This patch changes the syscall handler to doom (tabort) active
transactions when a syscall is made and return very early without
performing the syscall and keeping side effects to a minimum (no CPU
accounting or system call tracing is performed). Also included is a
new HWCAP2 bit, PPC_FEATURE2_HTM_NOSC, to indicate this
behaviour to userspace.

Currently, the system call instruction automatically suspends an
active transaction which causes side effects to persist when an active
transaction fails.

This does change the kernel's behaviour, but in a way that was
documented as unsupported.  It doesn't reduce functionality as
syscalls will still be performed after tsuspend; it just requires that
the transaction be explicitly suspended.  It also provides a
consistent interface and makes the behaviour of user code
substantially the same across powerpc and platforms that do not
support suspended transactions (e.g. x86 and s390).

Performance measurements using
http://ozlabs.org/~anton/junkcode/null_syscall.c indicate the cost of
a normal (non-aborted) system call increases by about 0.25%.
Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b4b56f9e

18 6月, 2015 4 次提交

powerpc/iommu/ioda2: Enable compile with IOV=on and IOMMU_API=off · b5926430

由 Alexey Kardashevskiy 提交于 6月 15, 2015

The pnv_pci_ioda2_unset_window() function is used to do the final
cleanup of a DMA window being released:
- via VFIO ioctl by the guest request;
- via unplugging a virtual PCI function.
However the function was under #ifdef CONFIG_IOMMU_API and was missing.

This moves the helper outside of IOMMU_API block and enables it
for either or both IOMMU_API and PCI_IOV.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b5926430

powerpc/include: Add opal-prd to installed uapi headers · cdf2bc1c

由 Jeremy Kerr 提交于 6月 17, 2015

We'll want to build the opal-prd daemon with the prd headers, so include
this in the uapi headers list.
Signed-off-by: NJeremy Kerr <jk@ozlabs.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

cdf2bc1c

powerpc/powernv: fix construction of opal PRD messages · 7185795a

由 Jeremy Kerr 提交于 6月 15, 2015

We currently have a bug in the PRD code, where the contents of an
incoming message (beyond the header) will be overwritten by the list
item manipulations when adding to to the prd_msg_queue.

This change reorders struct opal_prd_msg_queue_item, so that the
message body doesn't overlap the list_head.

We also clarify the memcpy of the message, as we're copying unnecessary
bytes at the end of the message data.
Signed-off-by: NJeremy Kerr <jk@ozlabs.org>
Acked-by: NStewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7185795a

powerpc/powernv: Increase opal-irqchip initcall priority · 02b6505c

由 Alistair Popple 提交于 6月 17, 2015

The eeh subsystem for powernv requires the opal event irqchip to be
initialised prior to initialisation or the following errors are
produced (and eeh doesn't work as expected):

irq: XICS didn't like hwirq-0x9 to VIRQ17 mapping (rc=-22)
pnv_eeh_post_init: Can't request OPAL event interrupt (0)

On powernv eeh is initialised from a subsys_initcall due to a check
for machine_is(powernv) in eeh_init(). This patch increases the
initcall priority of opal_event_init() to an arch_initcall to ensure
the opal event interface is initialised prior to any users of it.
Signed-off-by: NAlistair Popple <alistair@popple.id.au>
Reported-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

02b6505c

17 6月, 2015 5 次提交

powerpc: Make doorbell check preemption safe · 3609d819

由 Shreyas B. Prabhu 提交于 5月 20, 2015

Doorbell can be used to cause ipi on cpus which are sibling threads on
the same core. So icp_native_cause_ipi checks if the destination cpu
is a sibling thread of the current cpu and uses doorbell in such cases.

But while running with CONFIG_PREEMPT=y, since this section is
preemtible, we can run into issues if after we check if the destination
cpu is a sibling cpu, the task gets migrated from a sibling cpu to a
cpu on another core.

Fix this by using get_cpu()/ put_cpu()
Signed-off-by: NShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3609d819

powerpc: don't use module_init for non-modular core hugetlb code · 6f114281

由 Paul Gortmaker 提交于 5月 01, 2015

The hugetlbpage.o is obj-y (always built in).  It will never
be modular, so using module_init as an alias for __initcall is
somewhat misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of arch_initcall (which
makes sense for arch code) will thus change this registration
from level 6-device to level 3-arch (i.e. slightly earlier).
However no observable impact of that small difference has
been observed during testing, or is expected.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

6f114281

powerpc: use subsys_initcall for Freescale Local Bus · 383d14a5

由 Paul Gortmaker 提交于 5月 01, 2015

The FSL_SOC option is bool, and hence this code is either
present or absent.  It will never be modular, so using
module_init as an alias for __initcall is rather misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of subsys_initcall (which
makes sense for bus code) will thus change this registration
from level 6-device to level 4-subsys (i.e. slightly earlier).
However no observable impact of that small difference has
been observed during testing, or is expected.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Scott Wood <scottwood@freescale.com>
Acked-by: NScott Wood <scottwood@freescale.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

383d14a5

powerpc: don't use module_init in non-modular 83xx suspend code · a390a2f1

由 Paul Gortmaker 提交于 5月 01, 2015

The suspend.o is built for SUSPEND -- which is bool, and hence
this code is either present or absent.  It will never be modular,
so using module_init as an alias for __initcall can be somewhat
misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of device_initcall
directly in this change means that the runtime impact is
zero -- it will remain at level 6 in initcall ordering.

Cc: Scott Wood <scottwood@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

a390a2f1

powerpc: use device_initcall for registering rtc devices · 8f6b9512

由 Paul Gortmaker 提交于 5月 01, 2015

Currently these two RTC devices are in core platform code
where it is not possible for them to be modular.  It will
never be modular, so using module_init as an alias for
__initcall can be somewhat misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of device_initcall
directly in this change means that the runtime impact is
zero -- they will remain at level 6 in initcall ordering.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Geoff Levand <geoff@infradead.org>
Acked-by: NGeoff Levand <geoff@infradead.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

8f6b9512

15 6月, 2015 1 次提交

powerpc/powernv: pnv_init_idle_states() should only run on powernv · 4bece972

由 Michael Ellerman 提交于 6月 15, 2015

Although this init call checks for device tree properties before doing
anything, it should still only run on powernv machines.
Reviewed-by: NShreyas B Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4bece972

11 6月, 2015 7 次提交

powerpc: Don't use gcc specific options on clang · 238abecd

由 Anton Blanchard 提交于 5月 26, 2015

We have code to choose between several options, eg. -mabi=elfv2 vs
-mcall-aixdesc, and -mcmodel=medium vs -mminimal-toc. But these are all
GCC specific, so use cc-option on all of them.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

238abecd

powerpc: Don't use -mno-strict-align on clang · 92d6cf2d

由 Anton Blanchard 提交于 5月 26, 2015

We added -mno-strict-align in commit f036b368 (powerpc: Work around little
endian gcc bug) to fix gcc bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57134

Clang doesn't understand it. We need to use a conditional because we can't use the
simpler call cc-option here.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

92d6cf2d

powerpc: Only use -mtraceback=no, -mno-string and -msoft-float if toolchain supports it · a50a862e

由 Anton Blanchard 提交于 5月 26, 2015

These options are not recognised on LLVM, so use call cc-option to check
for support.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a50a862e

powerpc: Only use -mabi=altivec if toolchain supports it · 1fb3f5a7

由 Anton Blanchard 提交于 5月 26, 2015

The -mabi=altivec option is not recognised on LLVM, so use call cc-option
to check for support.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1fb3f5a7

powerpc: Fix duplicate const clang warning in user access code · b91c1e3e

由 Anton Blanchard 提交于 5月 26, 2015

We see a large number of duplicate const errors in the user access
code when building with llvm/clang:

  include/linux/pagemap.h:576:8: warning: duplicate 'const' declaration specifier
      [-Wduplicate-decl-specifier]
        ret = __get_user(c, uaddr);

The problem is we are doing const __typeof__(*(ptr)), which will hit the
warning if ptr is marked const.

Removing const does not seem to have any effect on GCC code generation.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b91c1e3e

vfio: powerpc/spapr: Support Dynamic DMA windows · e633bc86

由 Alexey Kardashevskiy 提交于 6月 05, 2015

This adds create/remove window ioctls to create and remove DMA windows.
sPAPR defines a Dynamic DMA windows capability which allows
para-virtualized guests to create additional DMA windows on a PCI bus.
The existing linux kernels use this new window to map the entire guest
memory and switch to the direct DMA operations saving time on map/unmap
requests which would normally happen in a big amounts.

This adds 2 ioctl handlers - VFIO_IOMMU_SPAPR_TCE_CREATE and
VFIO_IOMMU_SPAPR_TCE_REMOVE - to create and remove windows.
Up to 2 windows are supported now by the hardware and by this driver.

This changes VFIO_IOMMU_SPAPR_TCE_GET_INFO handler to return additional
information such as a number of supported windows and maximum number
levels of TCE tables.

DDW is added as a capability, not as a SPAPR TCE IOMMU v2 unique feature
as we still want to support v2 on platforms which cannot do DDW for
the sake of TCE acceleration in KVM (coming soon).
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
[aw: for the vfio related changes]
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e633bc86

vfio: powerpc/spapr: Register memory and define IOMMU v2 · 2157e7b8

由 Alexey Kardashevskiy 提交于 6月 05, 2015

The existing implementation accounts the whole DMA window in
the locked_vm counter. This is going to be worse with multiple
containers and huge DMA windows. Also, real-time accounting would requite
additional tracking of accounted pages due to the page size difference -
IOMMU uses 4K pages and system uses 4K or 64K pages.

Another issue is that actual pages pinning/unpinning happens on every
DMA map/unmap request. This does not affect the performance much now as
we spend way too much time now on switching context between
guest/userspace/host but this will start to matter when we add in-kernel
DMA map/unmap acceleration.

This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU.
New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces
2 new ioctls to register/unregister DMA memory -
VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY -
which receive user space address and size of a memory region which
needs to be pinned/unpinned and counted in locked_vm.
New IOMMU splits physical pages pinning and TCE table update
into 2 different operations. It requires:
1) guest pages to be registered first
2) consequent map/unmap requests to work only with pre-registered memory.
For the default single window case this means that the entire guest
(instead of 2GB) needs to be pinned before using VFIO.
When a huge DMA window is added, no additional pinning will be
required, otherwise it would be guest RAM + 2GB.

The new memory registration ioctls are not supported by
VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration
will require memory to be preregistered in order to work.

The accounting is done per the user process.

This advertises v2 SPAPR TCE IOMMU and restricts what the userspace
can do with v1 or v2 IOMMUs.

In order to support memory pre-registration, we need a way to track
the use of every registered memory region and only allow unregistration
if a region is not in use anymore. So we need a way to tell from what
region the just cleared TCE was from.

This adds a userspace view of the TCE table into iommu_table struct.
It contains userspace address, one per TCE entry. The table is only
allocated when the ownership over an IOMMU group is taken which means
it is only used from outside of the powernv code (such as VFIO).

As v2 IOMMU supports IODA2 and pre-IODA2 IOMMUs (which do not support
DDW API), this creates a default DMA window for IODA2 for consistency.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
[aw: for the vfio related changes]
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2157e7b8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功