提交 · 8b23de29489fd63fce753db9d53055e4bbf8f616 · openeuler / raspberrypi-kernel

28 8月, 2013 5 次提交

KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls · 8b23de29

由 Paul Mackerras 提交于 8月 06, 2013

It turns out that if we exit the guest due to a hcall instruction (sc 1),
and the loading of the instruction in the guest exit path fails for any
reason, the call to kvmppc_ld() in kvmppc_get_last_inst() fetches the
instruction after the hcall instruction rather than the hcall itself.
This in turn means that the instruction doesn't get recognized as an
hcall in kvmppc_handle_exit_pr() but gets passed to the guest kernel
as a sc instruction.  That usually results in the guest kernel getting
a return code of 38 (ENOSYS) from an hcall, which often triggers a
BUG_ON() or other failure.

This fixes the problem by adding a new variant of kvmppc_get_last_inst()
called kvmppc_get_last_sc(), which fetches the instruction if necessary
from pc - 4 rather than pc.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

8b23de29

KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX · 9d1ffdd8

由 Paul Mackerras 提交于 8月 06, 2013

Currently the code assumes that once we load up guest FP/VSX or VMX
state into the CPU, it stays valid in the CPU registers until we
explicitly flush it to the thread_struct.  However, on POWER7,
copy_page() and memcpy() can use VMX.  These functions do flush the
VMX state to the thread_struct before using VMX instructions, but if
this happens while we have guest state in the VMX registers, and we
then re-enter the guest, we don't reload the VMX state from the
thread_struct, leading to guest corruption.  This has been observed
to cause guest processes to segfault.

To fix this, we check before re-entering the guest that all of the
bits corresponding to facilities owned by the guest, as expressed
in vcpu->arch.guest_owned_ext, are set in current->thread.regs->msr.
Any bits that have been cleared correspond to facilities that have
been used by kernel code and thus flushed to the thread_struct, so
for them we reload the state from the thread_struct.

We also need to check current->thread.regs->msr before calling
giveup_fpu() or giveup_altivec(), since if the relevant bit is
clear, the state has already been flushed to the thread_struct and
to flush it again would corrupt it.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

9d1ffdd8

KVM: PPC: Book3S: Fix compile error in XICS emulation · 7bfa9ad5

由 Paul Mackerras 提交于 8月 06, 2013

Commit 8e44ddc3 ("powerpc/kvm/book3s: Add support for H_IPOLL and
H_XIRR_X in XICS emulation") added a call to get_tb() but didn't
include the header that defines it, and on some configs this means
book3s_xics.c fails to compile:

arch/powerpc/kvm/book3s_xics.c: In function ‘kvmppc_xics_hcall’:
arch/powerpc/kvm/book3s_xics.c:812:3: error: implicit declaration of function ‘get_tb’ [-Werror=implicit-function-declaration]

Cc: stable@vger.kernel.org [v3.10, v3.11]
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

7bfa9ad5

KVM: PPC: Book3S PR: return appropriate error when allocation fails · 7c7b406e

由 Thadeu Lima de Souza Cascardo 提交于 7月 17, 2013

err was overwritten by a previous function call, and checked to be 0. If
the following page allocation fails, 0 is going to be returned instead
of -ENOMEM.
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

7c7b406e

arch: powerpc: kvm: add signed type cast for comparation · 5d226ae5

由 Chen Gang 提交于 7月 22, 2013

'rmls' is 'unsigned long', lpcr_rmls() will return negative number when
failure occurs, so it need a type cast for comparing.

'lpid' is 'unsigned long', kvmppc_alloc_lpid() return negative number
when failure occurs, so it need a type cast for comparing.
Signed-off-by: NChen Gang <gang.chen@asianux.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

5d226ae5

23 8月, 2013 1 次提交

powerpc/kvm: Copy the pvr value after memset · 87916442

由 Aneesh Kumar K.V 提交于 8月 22, 2013

Otherwise we would clear the pvr value
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

87916442

25 7月, 2013 1 次提交

KVM: PPC: Book3S PR: Load up SPRG3 register with guest value on guest entry · c8ae0ace

由 Paul Mackerras 提交于 7月 11, 2013

Unlike the other general-purpose SPRs, SPRG3 can be read by usermode
code, and is used in recent kernels to store the CPU and NUMA node
numbers so that they can be read by VDSO functions.  Thus we need to
load the guest's SPRG3 value into the real SPRG3 register when entering
the guest, and restore the host's value when exiting the guest.  We don't
need to save the guest SPRG3 value when exiting the guest as usermode
code can't modify SPRG3.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

c8ae0ace

11 7月, 2013 2 次提交

kvm/ppc/booke: Don't call kvm_guest_enter twice · c09ec198

由 Scott Wood 提交于 7月 10, 2013

kvm_guest_enter() was already called by kvmppc_prepare_to_enter().
Don't call it again.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

c09ec198

kvm/ppc: Call trace_hardirqs_on before entry · 5f1c248f

由 Scott Wood 提交于 7月 10, 2013

Currently this is only being done on 64-bit.  Rather than just move it
out of the 64-bit ifdef, move it to kvm_lazy_ee_enable() so that it is
consistent with lazy ee state, and so that we don't track more host
code as interrupts-enabled than necessary.

Rename kvm_lazy_ee_enable() to kvm_fix_ee_before_entry() to reflect
that this function now has a role on 32-bit as well.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

5f1c248f

10 7月, 2013 2 次提交

KVM: PPC: Book3S HV: Allow negative offsets to real-mode hcall handlers · 4baa1d87

由 Paul Mackerras 提交于 7月 08, 2013

The table of offsets to real-mode hcall handlers in book3s_hv_rmhandlers.S
can contain negative values, if some of the handlers end up before the
table in the vmlinux binary. Thus we need to use a sign-extending load
to read the values in the table rather than a zero-extending load.
Without this, the host crashes when the guest does one of the hcalls
with negative offsets, due to jumping to a bogus address.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

4baa1d87

KVM: PPC: Book3S HV: Correct tlbie usage · 54480501

由 Paul Mackerras 提交于 7月 08, 2013

This corrects the usage of the tlbie (TLB invalidate entry) instruction
in HV KVM. The tlbie instruction changed between PPC970 and POWER7.
On the PPC970, the bit to select large vs. small page is in the instruction,
not in the RB register value. This changes the code to use the correct
form on PPC970.

On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower)
field of the RB value incorrectly for 64k pages. This fixes it.

Since we now have several cases to handle for the tlbie instruction, this
factors out the code to do a sequence of tlbies into a new function,
do_tlbies(), and calls that from the various places where the code was
doing tlbie instructions inline. It also makes kvmppc_h_bulk_remove()
use the same global_invalidates() function for determining whether to do
local or global TLB invalidations as is used in other places, for
consistency, and also to make sure that kvm->arch.need_tlb_flush gets
updated properly.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

54480501

08 7月, 2013 4 次提交

powerpc/kvm: Use 256K chunk to track both RMA and hash page table allocation. · 990978e9

由 Aneesh Kumar K.V 提交于 7月 02, 2013

Both RMA and hash page table request will be a multiple of 256K. We can use
a chunk size of 256K to track the free/used 256K chunk in the bitmap. This
should help to reduce the bitmap size.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

990978e9

powerpc/kvm: Contiguous memory allocator based RMA allocation · 6c45b810

由 Aneesh Kumar K.V 提交于 7月 02, 2013

Older version of power architecture use Real Mode Offset register and Real Mode Limit
Selector for mapping guest Real Mode Area. The guest RMA should be physically
contigous since we use the range when address translation is not enabled.

This patch switch RMA allocation code to use contigous memory allocator. The patch
also remove the the linear allocator which not used any more
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

6c45b810

powerpc/kvm: Contiguous memory allocator based hash page table allocation · fa61a4e3

由 Aneesh Kumar K.V 提交于 7月 02, 2013

Powerpc architecture uses a hash based page table mechanism for mapping virtual
addresses to physical address. The architecture require this hash page table to
be physically contiguous. With KVM on Powerpc currently we use early reservation
mechanism for allocating guest hash page table. This implies that we need to
reserve a big memory region to ensure we can create large number of guest
simultaneously with KVM on Power. Another disadvantage is that the reserved memory
is not available to rest of the subsystems and and that implies we limit the total
available memory in the host.

This patch series switch the guest hash page table allocation to use
contiguous memory allocator.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

fa61a4e3

KVM: PPC: Book3S: Ignore DABR register · f3532028

由 Alexander Graf 提交于 7月 02, 2013

We don't emulate breakpoints yet, so just ignore reads and writes
to / from DABR.

This fixes booting of more recent Linux guest kernels for me.
Reported-by: NNello Martuscielli <ppc.addon@gmail.com>
Tested-by: NNello Martuscielli <ppc.addon@gmail.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

f3532028

30 6月, 2013 9 次提交

powerpc/eeh: Fix fetching bus for single-dev-PE · ea461abf

由 Gavin Shan 提交于 6月 05, 2013

While running Linux as guest on top of phyp, we possiblly have
PE that includes single PCI device. However, we didn't return
its PCI bus correctly and it leads to failure on recovery from
EEH errors for single-dev-PE. The patch fixes the issue.

Cc: <stable@vger.kernel.org> # v3.7+
Cc: Steve Best <sbest@us.ibm.com>
Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ea461abf

KVM: PPC: Ignore PIR writes · a3ff5fbc

由 Alexander Graf 提交于 6月 27, 2013

While technically it's legal to write to PIR and have the identifier changed,
we don't implement logic to do so because we simply expose vcpu_id to the guest.

So instead, let's ignore writes to PIR. This ensures that we don't inject faults
into the guest for something the guest is allowed to do. While at it, we cross
our fingers hoping that it also doesn't mind that we broke its PIR read values.
Signed-off-by: NAlexander Graf <agraf@suse.de>

a3ff5fbc

KVM: PPC: Book3S PR: Invalidate SLB entries properly · 681562cd

由 Paul Mackerras 提交于 6月 22, 2013

At present, if the guest creates a valid SLB (segment lookaside buffer)
entry with the slbmte instruction, then invalidates it with the slbie
instruction, then reads the entry with the slbmfee/slbmfev instructions,
the result of the slbmfee will have the valid bit set, even though the
entry is not actually considered valid by the host. This is confusing,
if not worse. This fixes it by zeroing out the orige and origv fields
of the SLB entry structure when the entry is invalidated.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

681562cd

KVM: PPC: Book3S PR: Allow guest to use 1TB segments · 0f296829

由 Paul Mackerras 提交于 6月 22, 2013

With this, the guest can use 1TB segments as well as 256MB segments.
Since we now have the situation where a single emulated guest segment
could correspond to multiple shadow segments (as the shadow segments
are still 256MB segments), this adds a new kvmppc_mmu_flush_segment()
to scan for all shadow segments that need to be removed.

This restructures the guest HPT (hashed page table) lookup code to
use the correct hashing and matching functions for HPTEs within a
1TB segment. We use the standard hpt_hash() function instead of
open-coding the hash calculation, and we use HPTE_V_COMPARE() with
an AVPN value that has the B (segment size) field included. The
calculation of avpn is done a little earlier since it doesn't change
in the loop starting at the do_second label.

The computation in kvmppc_mmu_book3s_64_esid_to_vsid() changes so that
it returns a 256MB VSID even if the guest SLB entry is a 1TB entry.
This is because the users of this function are creating 256MB SLB
entries. We set a new VSID_1T flag so that entries created from 1T
segments don't collide with entries from 256MB segments.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

0f296829

KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match · 6ed1485f

由 Paul Mackerras 提交于 6月 22, 2013

The loop in kvmppc_mmu_book3s_64_xlate() that looks up a translation
in the guest hashed page table (HPT) keeps going if it finds an
HPTE that matches but doesn't allow access.  This is incorrect; it
is different from what the hardware does, and there should never be
more than one matching HPTE anyway.  This fixes it to stop when any
matching HPTE is found.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

6ed1485f

KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry · bc1bc4e3

由 Paul Mackerras 提交于 6月 22, 2013

On entering a PR KVM guest, we invalidate the whole SLB before loading
up the guest entries. We do this using an slbia instruction, which
invalidates all entries except entry 0, followed by an slbie to
invalidate entry 0. However, the slbie turns out to be ineffective
in some circumstances (specifically when the host linear mapping uses
64k pages) because of errors in computing the parameter to the slbie.
The result is that the guest kernel hangs very early in boot because
it takes a DSI the first time it tries to access kernel data using
a linear mapping address in real mode.

Currently we construct bits 36 - 43 (big-endian numbering) of the slbie
parameter by taking bits 56 - 63 of the SLB VSID doubleword. These bits
for the tlbie are C (class, 1 bit), B (segment size, 2 bits) and 5
reserved bits. For the SLB VSID doubleword these are C (class, 1 bit),
reserved (1 bit), LP (large page size, 2 bits), and 4 reserved bits.
Thus we are not setting the B field correctly, and when LP = 01 as
it is for 64k pages, we are setting a reserved bit.

Rather than add more instructions to calculate the slbie parameter
correctly, this takes a simpler approach, which is to set entry 0 to
zeroes explicitly. Normally slbmte should not be used to invalidate
an entry, since it doesn't invalidate the ERATs, but it is OK to use
it to invalidate an entry if it is immediately followed by slbia,
which does invalidate the ERATs. (This has been confirmed with the
Power architects.) This approach takes fewer instructions and will
work whatever the contents of entry 0.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

bc1bc4e3

KVM: PPC: Book3S PR: Fix proto-VSID calculations · 8ed7b7e9

由 Paul Mackerras 提交于 6月 22, 2013

This makes sure the calculation of the proto-VSIDs used by PR KVM
is done with 64-bit arithmetic. Since vcpu3s->context_id[] is int,
when we do vcpu3s->context_id[0] << ESID_BITS the shift will be done
with 32-bit instructions, possibly leading to significant bits
getting lost, as the context id can be up to 524283 and ESID_BITS is
18. To fix this we cast the context id to u64 before shifting.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

8ed7b7e9

KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL · 5f17ce8b

由 Tiejun Chen 提交于 5月 13, 2013

Availablity of the doorbell_exception function is guarded by
CONFIG_PPC_DOORBELL. Use the same define to guard our caller
of it.
Signed-off-by: NTiejun Chen <tiejun.chen@windriver.com>
[agraf: improve patch description]
Signed-off-by: NAlexander Graf <agraf@suse.de>

5f17ce8b

powerpc/pci: Improve device hotplug initialization · 7846de40

由 Guenter Roeck 提交于 6月 10, 2013

Commit 37f02195 (powerpc/pci: fix PCI-e devices rescan issue on powerpc
platform) fixes a problem with interrupt and DMA initialization on hot
plugged devices. With this commit, interrupt and DMA initialization for
hot plugged devices is handled in the pci device enable function.

This approach has a couple of drawbacks. First, it creates two code paths
for device initialization, one for hot plugged devices and another for devices
known during the initial PCI scan. Second, the initialization code for hot
plugged devices is only called when the device is enabled, ie typically
in the probe function. Also, the platform specific setup code is called each
time pci_enable_device() is called, not only once during device discovery,
meaning it is actually called multiple times, once for devices discovered
during the initial scan and again each time a driver is re-loaded.

The visible result is that interrupt pins are only assigned to hot plugged
devices when the device driver is loaded. Effectively this changes the PCI
probe API, since pci_dev->irq and the device's dma configuration will now
only be valid after pci_enable() was called at least once. A more subtle
change is that platform specific PCI device setup is moved from device
discovery into the driver's probe function, more specifically into the
pci_enable_device() call.

To fix the inconsistencies, add new function pcibios_add_device.
Call pcibios_setup_device from pcibios_setup_bus_devices if device setup
is not complete, and from pcibios_add_device if bus setup is complete.

With this change, device setup code is moved back into device initialization,
and called exactly once for both static and hot plugged devices.

[ This also fixes a regression introduced by the above patch which
  causes dev->irq to be overwritten under some cirumstances after
  MSIs have been enabled for the device which leads to crashes due
  to the MSI core "hijacking" dev->irq to store the base MSI number
  and not the LSI. --BenH
]

Cc: Yuanquan Chen <Yuanquan.Chen@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hiroo Matsumoto <matsumoto.hiroo@jp.fujitsu.com>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7846de40

29 6月, 2013 3 次提交
- A
  proc_powerpc: switch to fixed_size_llseek() · b33159b7
  由 Al Viro 提交于 6月 23, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  b33159b7
- A
  [readdir] switch dcache_readdir() users to ->iterate() · 5f99f4e7
  由 Al Viro 提交于 5月 15, 2013
```
new helpers - dir_emit_dot(file, ctx, dentry), dir_emit_dotdot(file, ctx),
dir_emit_dots(file, ctx).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  5f99f4e7
- A
  consolidate io_remap_pfn_range definitions · 40d158e6
  由 Al Viro 提交于 5月 11, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  40d158e6
28 6月, 2013 1 次提交

powerpc/eeh: Add eeh_dev to the cache during boot · 1abd6018

由 Thadeu Lima de Souza Cascardo 提交于 6月 27, 2013

commit f8f7d63f ("powerpc/eeh: Trace eeh
device from I/O cache") broke EEH on pseries for devices that were
present during boot and have not been hotplugged/DLPARed.

eeh_check_failure will get the eeh_dev from the cache, and will get
NULL. eeh_addr_cache_build adds the addresses to the cache, but eeh_dev
for the giving pci_device is not set yet. Just reordering the call to
eeh_addr_cache_insert_dev works fine. The ordering is similar to the one
in eeh_add_device_late.
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Acked-by: NGavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

1abd6018

26 6月, 2013 1 次提交

arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not · a41b56ef

由 Maarten Lankhorst 提交于 6月 20, 2013

This will allow me to call functions that have multiple
arguments if fastpath fails. This is required to support ticket
mutexes, because they need to be able to pass an extra argument
to the fail function.

Originally I duplicated the functions, by adding
__mutex_fastpath_lock_retval_arg. This ended up being just a
duplication of the existing function, so a way to test if
fastpath was called ended up being better.

This also cleaned up the reservation mutex patch some by being
able to call an atomic_set instead of atomic_xchg, and making it
easier to detect if the wrong unlock function was previously
used.
Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: robclark@gmail.com
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130620113105.4001.83929.stgit@patserSigned-off-by: NIngo Molnar <mingo@kernel.org>

a41b56ef

25 6月, 2013 1 次提交

powerpc/pci: Fix boot panic on mpc83xx (regression) · b37e1613

由 Rojhalat Ibrahim 提交于 6月 17, 2013

The following commit caused a fatal oops when booting on mpc83xx with
a non-express PCI bus (regardless of whether a PCI device is present):

commit 50d8f87d
Author: Rojhalat Ibrahim <imr@rtschenk.de>
Date:   Mon Apr 8 10:15:28 2013 +0200

    powerpc/fsl-pci Make PCIe hotplug work with Freescale PCIe controllers

    Up to now the PCIe link status on Freescale PCIe controllers was only
    checked once at boot time. So hotplug did not work. With this patch the
    link status is checked on every config read. PCIe devices not present at
    boot time are found after doing 'echo 1 >/sys/bus/pci/rescan'.
Signed-off-by: NRojhalat Ibrahim <imr@rtschenk.de>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

This patch fixes the issue by calling setup_indirect_pci for all device types.
fsl_indirect_read_config is now only used for booke/86xx PCIe controllers.
Reported-by: NMichael Guntsche <mike@it-loops.com>
Cc: Scott Wood <scottwood@freescale.com>
Signed-off-by: NRojhalat Ibrahim <imr@rtschenk.de>
Signed-off-by: NScott Wood <scottwood@freescale.com>

b37e1613

20 6月, 2013 1 次提交

powerpc: Fix bad pmd error with book3E config · 8bbd9f04

由 Aneesh Kumar K.V 提交于 6月 19, 2013

Book3E uses the hugepd at PMD level and don't encode pte directly
at the pmd level. So it will find the lower bits of pmd set
and the pmd_bad check throws error. Infact the current code
will never take the free_hugepd_range call at all because it will
clear the pmd if it find a hugepd pointer. Fix this by clearing
bad pmd only if it is not a hugepd pointer.

This is regression introduced by e2b3d202
"powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format"
Reported-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

8bbd9f04

19 6月, 2013 2 次提交

sched: Rename sched.c as sched/core.c in comments and Documentation · 0a0fca9d

由 Viresh Kumar 提交于 6月 04, 2013

Most of the stuff from kernel/sched.c was moved to kernel/sched/core.c long time
back and the comments/Documentation never got updated.

I figured it out when I was going through sched-domains.txt and so thought of
fixing it globally.

I haven't crossed check if the stuff that is referenced in sched/core.c by all
these files is still present and hasn't changed as that wasn't the motive behind
this patch.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/cdff76a265326ab8d71922a1db5be599f20aad45.1370329560.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

0a0fca9d

kvm/ppc/booke: Delay kvmppc_lazy_ee_enable · f8941fbe

由 Scott Wood 提交于 6月 11, 2013

kwmppc_lazy_ee_enable() should be called as late as possible,
or else we get things like WARN_ON(preemptible()) in enable_kernel_fp()
in configurations where preemptible() works.

Note that book3s_pr already waits until just before __kvmppc_vcpu_run
to call kvmppc_lazy_ee_enable().
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f8941fbe

15 6月, 2013 3 次提交

powerpc: Fix missing/delayed calls to irq_work · 230b3034

由 Benjamin Herrenschmidt 提交于 6月 15, 2013

When replaying interrupts (as a result of the interrupt occurring
while soft-disabled), in the case of the decrementer, we are exclusively
testing for a pending timer target. However we also use decrementer
interrupts to trigger the new "irq_work", which in this case would
be missed.

This change the logic to force a replay in both cases of a timer
boundary reached and a decrementer interrupt having actually occurred
while disabled. The former test is still useful to catch cases where
a CPU having been hard-disabled for a long time completely misses the
interrupt due to a decrementer rollover.

CC: <stable@vger.kernel.org> [v3.4+]
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: NSteven Rostedt <rostedt@goodmis.org>

230b3034

powerpc: Fix emulation of illegal instructions on PowerNV platform · bf593907

由 Paul Mackerras 提交于 6月 14, 2013

Normally, the kernel emulates a few instructions that are unimplemented
on some processors (e.g. the old dcba instruction), or privileged (e.g.
mfpvr). The emulation of unimplemented instructions is currently not
working on the PowerNV platform. The reason is that on these machines,
unimplemented and illegal instructions cause a hypervisor emulation
assist interrupt, rather than a program interrupt as on older CPUs.
Our vector for the emulation assist interrupt just calls
program_check_exception() directly, without setting the bit in SRR1
that indicates an illegal instruction interrupt. This fixes it by
making the emulation assist interrupt set that bit before calling
program_check_interrupt(). With this, old programs that use no-longer
implemented instructions such as dcba now work again.

CC: <stable@vger.kernel.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

bf593907

powerpc: Fix stack overflow crash in resume_kernel when ftracing · 0e37739b

由 Michael Ellerman 提交于 6月 13, 2013

It's possible for us to crash when running with ftrace enabled, eg:

  Bad kernel stack pointer bffffd12 at c00000000000a454
  cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40]
      pc: c00000000000a454: resume_kernel+0x34/0x60
      lr: c00000000000335c: performance_monitor_common+0x15c/0x180
      sp: bffffd12
     msr: 8000000000001032
     dar: bffffd12
   dsisr: 42000000

If we look at current's stack (paca->__current->stack) we see it is
equal to c0000002ecab0000. Our stack is 16K, and comparing to
paca->kstack (c0000002ecab3e30) we can see that we have overflowed our
kernel stack. This leads to us writing over our struct thread_info, and
in this case we have corrupted thread_info->flags and set
_TIF_EMULATE_STACK_STORE.

Dumping the stack we see:

  3:mon> t c0000002ecab0000
  [c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70
  [c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180
  --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30
  [c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable)
  [c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130
  [c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
  [c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90
  [c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34
  [c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300
  [c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180
  --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
  [c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable)
  [c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280
  [c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130
  [c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
  [c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40
  [c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34
  --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0

  ... and so on

__ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry
path. At that point the irq state is not consistent, ie. interrupts are
hard disabled (by the exception entry), but the paca soft-enabled flag
may be out of sync.

This leads to the local_irq_restore() in trace_graph_entry() actually
enabling interrupts, which we do not want. Because we have not yet
reprogrammed the decrementer we immediately take another decrementer
exception, and recurse.

The fix is twofold. Firstly make sure we call DISABLE_INTS before
calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles
the irq state in the paca with the hardware, making it safe again to
call local_irq_save/restore().

Although that should be sufficient to fix the bug, we also mark the
runlatch routines as notrace. They are called very early in the
exception entry and we are asking for trouble tracing them. They are
also fairly uninteresting and tracing them just adds unnecessary
overhead.

[ This regression was introduced by fe1952fc
  "powerpc: Rework runlatch code" by myself --BenH
]

CC: <stable@vger.kernel.org> [v3.4+]
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0e37739b

11 6月, 2013 4 次提交

kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit() · 7c11c0cc

由 Scott Wood 提交于 6月 06, 2013

EE is hard-disabled on entry to kvmppc_handle_exit(), so call
hard_irq_disable() so that PACA_IRQ_HARD_DIS is set, and soft_enabled
is unset.

Without this, we get warnings such as arch/powerpc/kernel/time.c:300,
and sometimes host kernel hangs.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

7c11c0cc

kvm/ppc/booke: Hold srcu lock when calling gfn functions · f1e89028

由 Scott Wood 提交于 6月 06, 2013

KVM core expects arch code to acquire the srcu lock when calling
gfn_to_memslot and similar functions.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

f1e89028

kvm/ppc/booke64: Disable e6500 support · 2b6398fc

由 Scott Wood 提交于 6月 06, 2013

The previous patch made 64-bit booke KVM build again, but Altivec
support is still not complete, and we can't prevent the guest from
turning on Altivec (which can corrupt host state until state
save/restore is implemented).  Disable e6500 on KVM until this is
fixed.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

2b6398fc

kvm/ppc/booke64: Fix AltiVec interrupt numbers and build breakage · 4edd1ae9

由 Mihai Caraman 提交于 6月 06, 2013

Interrupt numbers defined for Book3E follows IVORs definition. Align
BOOKE_INTERRUPT_ALTIVEC_UNAVAIL and BOOKE_INTERRUPT_ALTIVEC_ASSIST to this
rule which also fixes the build breakage.
IVORs 32 and 33 are shared so reflect this in the interrupts naming.

This fixes a build break for 64-bit booke KVM.
Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

4edd1ae9