提交 · f11fe5524aa7427ae8f23dbede8a29fc4f613a71 · openanolis / cloud-kernel

07 12月, 2011 1 次提交

powerpc/powernv: Update OPAL interfaces · f11fe552

由 Benjamin Herrenschmidt 提交于 11月 29, 2011

This adds some more interfaces for OPAL v2
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

f11fe552

28 11月, 2011 1 次提交

powerpc: Implement CONFIG_STRICT_DEVMEM · 1d54cf2b

由 sukadev@linux.vnet.ibm.com 提交于 8月 30, 2011

As described in the help text in the patch, this token restricts general
access to /dev/mem as a way of increasing the security. Specifically, access
to exclusive IOMEM and kernel RAM is denied unless CONFIG_STRICT_DEVMEM is
set to 'n'.

Implement the 'devmem_is_allowed()' interface for Powerpc. It will be
called from range_is_allowed() when userpsace attempts to access /dev/mem.

This patch is based on an earlier patch from Steve Best and with input from
Paul Mackerras and Scott Wood.

[BenH] Fixed a typo or two and removed the generic change which should
       be submitted as a separate patch
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

1d54cf2b

25 11月, 2011 9 次提交

powerpc/powernv: PCI support for p7IOC under OPAL v2 · 184cd4a3

由 Benjamin Herrenschmidt 提交于 11月 15, 2011

This adds support for p7IOC (and possibly other IODA v1 IO Hubs)
using OPAL v2 interfaces.

We completely take over resource assignment and assign them using an
algorithm that hands out device BARs in a way that makes them fit in
individual segments of the M32 window of the bridge, which enables us
to assign individual PEs to devices and functions.

The current implementation gives out a PE per functions on PCIe, and a
PE for the entire bridge for PCIe to PCI-X bridges.

This can be adjusted / fine tuned later.

We also setup DMA resources (32-bit only for now) and MSIs (both 32-bit
and 64-bit MSI are supported).

The DMA allocation tries to divide the available 256M segments of the
32-bit DMA address space "fairly" among PEs. This is done using a
"weight" heuristic which assigns less value to things like OHCI USB
controllers than, for example SCSI RAID controllers. This algorithm
will probably want some fine tuning for specific devices or device
types.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

184cd4a3

B
powerpc/powernv: Add TCE SW invalidation support · 1f1616e8
由 Benjamin Herrenschmidt 提交于 11月 06, 2011
```
This is used for newer IO Hubs such as p7IOC.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
```
1f1616e8

powerpc/pci: Add a platform hook after probe and before resource survey · 491b98c3

由 Benjamin Herrenschmidt 提交于 11月 06, 2011

Some platforms need to perform resource allocation using a custom algorithm
due to HW constraints, or may want to tweak things globally below a host
bridge. For example OPAL support for IODA will need to perform a
resource allocation pass that applies IODA specific segmentation
constraints to MMIO which cannot be done simply using the kernel generic
resource management code.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

491b98c3

powerpc: Add pgprot_cached_noncoherent() · 09c188c4

由 Geoff Thorpe 提交于 10月 27, 2011

This adds a pgprot combination required by some cache-enabled IO device
mappings, such as Freescale datapath (QMan and BMan) portals.
Signed-off-by: NGeoff Thorpe <geoff@geoffthorpe.net>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

09c188c4

powerpc/pseries: Cancel RTAS event scan before firmware flash · df17f56d

由 Ravi K. Nittala 提交于 10月 03, 2011

The RTAS firmware flash update is conducted using an RTAS call that is
serialized by lock_rtas() which uses spin_lock. While the flash is in
progress, rtasd performs scan for any RTAS events that are generated by
the system. rtasd keeps scanning for the RTAS events generated on the
machine. This is performed via workqueue mechanism. The rtas_event_scan()
also uses an RTAS call to scan the events, eventually trying to acquire
the spin_lock before issuing the request.

The flash update takes a while to complete and during this time, any other
RTAS call has to wait. In this case, rtas_event_scan() waits for a long time
on the spin_lock resulting in a soft lockup.

Fix: Just before the flash update is performed, the queued rtas_event_scan()
work item is cancelled from the work queue so that there is no other RTAS
call issued while the flash is in progress. After the flash completes, the
system reboots and the rtas_event_scan() is rescheduled.
Signed-off-by: NSuzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: NRavi Nittala <ravi.nittala@in.ibm.com>
Reported-by: NDivya Vikas <divya.vikas@in.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

df17f56d

powerpc/book3e: Add ICSWX/ACOP support to Book3e cores like A2 · fac26ad4

由 Jimi Xenidis 提交于 9月 29, 2011

ICSWX is also used by the A2 processor to access coprocessors,
although not all "chips" that contain A2s have coprocessors.
Signed-off-by: NJimi Xenidis <jimix@pobox.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fac26ad4

powerpc/pseries: Software invalidatation of TCEs · 8d3d589a

由 Milton Miller 提交于 6月 29, 2011

Some pseries IOMMUs cache TCEs but don't snoop when the TCEs are changed
in memory, hence we need manually invalidate in software.

This adds code to do the invalidate.  It keys off a device tree property
to say where the to do the MMIO for the invalidate and some information
on what the format of the invalidate including some magic routing info.

it_busno get overloaded with this magic routing info and it_index with
the MMIO address for the invalidate command.

This then gets hooked into the building and freeing of TCEs.

This is only useful on bare metal pseries.  pHyp takes care of this when
virtualised.

Based on patch from Milton with cleanups from Mikey.
Signed-off-by: NMilton Miller <miltonm@bga.com>
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

8d3d589a

powerpc/time: Optimise decrementer_check_overflow · 7df10275

由 Anton Blanchard 提交于 11月 23, 2011

decrementer_check_overflow is called from arch_local_irq_restore so
we want to make it as light weight as possible. As such, turn
decrementer_check_overflow into an inline function.

To avoid a circular mess of includes, separate out the two components
of struct decrementer_clock and keep the struct clock_event_device
part local to time.c.

The fast path improves from:

arch_local_irq_restore
     0:       mflr    r0
     4:       std     r0,16(r1)
     8:       stdu    r1,-112(r1)
     c:       stb     r3,578(r13)
    10:       cmpdi   cr7,r3,0
    14:       beq-    cr7,24 <.arch_local_irq_restore+0x24>
...
    24:       addi    r1,r1,112
    28:       ld      r0,16(r1)
    2c:       mtlr    r0
    30:       blr

to:

arch_local_irq_restore
    0:       std     r30,-16(r1)
    4:       ld      r30,0(r2)
    8:       stb     r3,578(r13)
    c:       cmpdi   cr7,r3,0
   10:       beq-    cr7,6c <.arch_local_irq_restore+0x6c>
...
   6c:       ld      r30,-16(r1)
   70:       blr

Unfortunately we still setup a local TOC (due to -mminimal-toc). Yet
another sign we should be moving to -mcmodel=medium.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7df10275

powerpc/time: Handle wrapping of decrementer · 37fb9a02

由 Anton Blanchard 提交于 11月 23, 2011

When re-enabling interrupts we have code to handle edge sensitive
decrementers by resetting the decrementer to 1 whenever it is negative.
If interrupts were disabled long enough that the decrementer wrapped to
positive we do nothing. This means interrupts can be delayed for a long
time until it finally goes negative again.

While we hope interrupts are never be disabled long enough for the
decrementer to go positive, we have a very good test team that can
drive any kernel into the ground. The softlockup data we get back
from these fails could be seconds in the future, completely missing
the cause of the lockup.

We already keep track of the timebase of the next event so use that
to work out if we should trigger a decrementer exception.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Cc: stable@kernel.org
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

37fb9a02

24 11月, 2011 1 次提交

powerpc/85xx: Add lbc suspend support for PM · 09cef8bd

由 Jia Hongtao 提交于 11月 21, 2011

Power supply for LBC registers is off when system go to deep-sleep state.
We save the values of registers before suspend and restore to registers
after resume.

We removed the last two reservation arrays from struct fsl_lbc_regs for
allocating less memory and minimizing the memcpy size.
Signed-off-by: NJiang Yutang <b14898@freescale.com>
Signed-off-by: NJia Hongtao <B38951@freescale.com>
Signed-off-by: NLi Yang <leoli@freescale.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

09cef8bd

17 11月, 2011 3 次提交

Revert "KVM: PPC: Add support for explicit HIOR setting" · bb75c627

由 Alexander Graf 提交于 11月 17, 2011

This reverts commit a15bd354.

It exceeded the padding on the SREGS struct, rendering the ABI
backwards-incompatible.

Conflicts:

	arch/powerpc/kvm/powerpc.c
	include/linux/kvm.h
Signed-off-by: NAvi Kivity <avi@redhat.com>

bb75c627

powerpc: Fix atomic_xxx_return barrier semantics · b97021f8

由 Benjamin Herrenschmidt 提交于 11月 15, 2011

The Documentation/memory-barriers.txt document requires that atomic
operations that return a value act as a memory barrier both before
and after the actual atomic operation.

Our current implementation doesn't guarantee this. More specifically,
while a load following the isync can not be issued before stwcx. has
completed, that completion doesn't architecturally means that the
result of stwcx. is visible to other processors (or any previous stores
for that matter) (typically, the other processors L1 caches can still
hold the old value).

This has caused an actual crash in RCU torture testing on Power 7

This fixes it by changing those atomic ops to use new macros instead
of RELEASE/ACQUIRE barriers, called ATOMIC_ENTRY and ATMOIC_EXIT barriers,
which are then defined respectively to lwsync and sync.

I haven't had a chance to measure the performance impact (or rather
what I measured with kernel compiles is in the noise, I yet have to
find a more precise benchmark)
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

b97021f8

powerpc/book3e-64: Fix debug support for userspace · 187b9f2a

由 Kumar Gala 提交于 10月 06, 2011

With the introduction of CONFIG_PPC_ADV_DEBUG_REGS user space debug is
broken on Book-E 64-bit parts that support delayed debug events.  When
switch_booke_debug_regs() sets DBCR0 we'll start getting debug events as
MSR_DE is also set and we aren't able to handle debug events from kernel
space.

We can remove the hack that always enables MSR_DE and loads up DBCR0 and
just utilize switch_booke_debug_regs() to get user space debug working
again.

We still need to handle critical/debug exception stacks & proper
save/restore of state for those exception levles to support debug events
from kernel space like we have on 32-bit.
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

187b9f2a

16 11月, 2011 1 次提交

powerpc: Copy down exception vectors after feature fixups · d715e433

由 Anton Blanchard 提交于 11月 14, 2011

kdump fails because we try to execute an HV only instruction. Feature
fixups are being applied after we copy the exception vectors down to 0
so they miss out on any updates.

We have always had this issue but it only became critical in v3.0
when we added CFAR support (breaks POWER5) and v3.1 when we added
POWERNV (breaks everyone).
Signed-off-by: NAnton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> [v3.0+]
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d715e433

08 11月, 2011 2 次提交

powerpc/ps3: Fix lv1_gpu_attribute hcall · 9fce85f7

由 Geoff Levand 提交于 10月 12, 2011

The lv1_gpu_attribute hcall takes three, not five input
arguments.  Adjust the lv1 hcall table and all calls.
Signed-off-by: NGeoff Levand <geoff@infradead.org>
CC: Takashi Iwai <tiwai@suse.de>
Acked-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

9fce85f7

powerpc/irq: Remove IRQF_DISABLED · a3a9f3b4

由 Yong Zhang 提交于 10月 21, 2011

Since commit [e58aa3d2: genirq: Run irq handlers with interrupts disabled],
We run all interrupt handlers with interrupts disabled
and we even check and yell when an interrupt handler
returns with interrupts enabled (see commit [b738a50a:
genirq: Warn when handler enables interrupts]).

So now this flag is a NOOP and can be removed.
Signed-off-by: NYong Zhang <yong.zhang0@gmail.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NGeoff Levand <geoff@infradead.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a3a9f3b4

01 11月, 2011 4 次提交

Cross Memory Attach · fcf63409

由 Christopher Yeoh 提交于 10月 31, 2011

The basic idea behind cross memory attach is to allow MPI programs doing
intra-node communication to do a single copy of the message rather than a
double copy of the message via shared memory.

The following patch attempts to achieve this by allowing a destination
process, given an address and size from a source process, to copy memory
directly from the source process into its own address space via a system
call.  There is also a symmetrical ability to copy from the current
process's address space into a destination process's address space.

- Use of /proc/pid/mem has been considered, but there are issues with
  using it:
  - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
  written to would need to be contiguous.
  - Currently mem_read allows only processes who are currently
  ptrace'ing the target and are still able to ptrace the target to read
  from the target. This check could possibly be moved to the open call,
  but its not clear exactly what race this restriction is stopping
  (reason  appears to have been lost)
  - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
  domain socket is a bit ugly from a userspace point of view,
  especially when you may have hundreds if not (eventually) thousands
  of processes  that all need to do this with each other
  - Doesn't allow for some future use of the interface we would like to
  consider adding in the future (see below)
  - Interestingly reading from /proc/pid/mem currently actually
  involves two copies! (But this could be fixed pretty easily)

As mentioned previously use of vmsplice instead was considered, but has
problems.  Since you need the reader and writer working co-operatively if
the pipe is not drained then you block.  Which requires some wrapping to
do non blocking on the send side or polling on the receive.  In all to all
communication it requires ordering otherwise you can deadlock.  And in the
example of many MPI tasks writing to one MPI task vmsplice serialises the
copying.

There are some cases of MPI collectives where even a single copy interface
does not get us the performance gain we could.  For example in an
MPI_Reduce rather than copy the data from the source we would like to
instead use it directly in a mathops (say the reduce is doing a sum) as
this would save us doing a copy.  We don't need to keep a copy of the data
from the source.  I haven't implemented this, but I think this interface
could in the future do all this through the use of the flags - eg could
specify the math operation and type and the kernel rather than just
copying the data would apply the specified operation between the source
and destination and store it in the destination.

Although we don't have a "second user" of the interface (though I've had
some nibbles from people who may be interested in using it for intra
process messaging which is not MPI).  This interface is something which
hardware vendors are already doing for their custom drivers to implement
fast local communication.  And so in addition to this being useful for
OpenMPI it would mean the driver maintainers don't have to fix things up
when the mm changes.

There was some discussion about how much faster a true zero copy would
go. Here's a link back to the email with some testing I did on that:

http://marc.info/?l=linux-mm&m=130105930902915&w=2

There is a basic man page for the proposed interface here:

http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

This has been implemented for x86 and powerpc, other architecture should
mainly (I think) just need to add syscall numbers for the process_vm_readv
and process_vm_writev. There are 32 bit compatibility versions for
64-bit kernels.

For arch maintainers there are some simple tests to be able to quickly
verify that the syscalls are working correctly here:

http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgzSigned-off-by: NChris Yeoh <yeohc@au1.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: <linux-man@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fcf63409

powerpc: fix implicit use of mutex.h by include/asm/spu.h · e415372a

由 Paul Gortmaker 提交于 7月 22, 2011

We've been getting the header implicitly via module.h in the past
but when we clean that up, we'll get this failure:

CC arch/powerpc/platforms/cell/beat_spu_priv1.o
In file included from arch/powerpc/platforms/cell/beat_spu_priv1.c:22:
arch/powerpc/include/asm/spu.h:190: error: field 'list_mutex' has incomplete type
make[2]: *** [arch/powerpc/platforms/cell/beat_spu_priv1.o] Error 1
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

e415372a

powerpc: include export.h for files using EXPORT_SYMBOL/THIS_MODULE · 93087948

由 Paul Gortmaker 提交于 7月 29, 2011

Fix failures in powerpc associated with the previously allowed
implicit module.h presence that now lead to things like this:

arch/powerpc/mm/mmu_context_hash32.c:76:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
arch/powerpc/mm/tlb_hash32.c:48:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL'
arch/powerpc/kernel/pci_32.c:51:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
arch/powerpc/kernel/iomap.c:36:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL'
arch/powerpc/platforms/44x/canyonlands.c:126:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL'
arch/powerpc/kvm/44x.c:168:59: error: 'THIS_MODULE' undeclared (first use in this function)

[with several contibutions from Stephen Rothwell <sfr@canb.auug.org.au>]
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

93087948

powerpc: add export.h to files making use of EXPORT_SYMBOL · 66b15db6

由 Paul Gortmaker 提交于 5月 27, 2011

With module.h being implicitly everywhere via device.h, the absence
of explicitly including something for EXPORT_SYMBOL went unnoticed.
Since we are heading to fix things up and clean module.h from the
device.h file, we need to explicitly include these files now.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

66b15db6

28 10月, 2011 1 次提交

compat: sync compat_stats with statfs. · 1448c721

由 Eric W. Biederman 提交于 10月 17, 2011

This was found by inspection while tracking a similar
bug in compat_statfs64, that has been fixed in mainline
since decemeber.

- This fixes a bug where not all of the f_spare fields
  were cleared on mips and s390.
- Add the f_flags field to struct compat_statfs
- Copy f_flags to userspace in case someone cares.
- Use __clear_user to copy the f_spare field to userspace
  to ensure that all of the elements of f_spare are cleared.
  On some architectures f_spare is has 5 ints and on some
  architectures f_spare only has 4 ints.  Which makes
  the previous technique of clearing each int individually
  broken.

I don't expect anyone actually uses the old statfs system
call anymore but if they do let them benefit from having
the compat and the native version working the same.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1448c721

07 10月, 2011 1 次提交

powerpc/fsl-booke: Handle L1 D-cache parity error correctly on e500mc · 37caf9f2

由 Kumar Gala 提交于 8月 27, 2011

If the L1 D-Cache is in write shadow mode the HW will auto-recover the
error.  However we might still log the error and cause a machine check
(if L1CSR0[CPE] - Cache error checking enable).  We should only treat
the non-write shadow case as non-recoverable.
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

37caf9f2

28 9月, 2011 1 次提交

doc: fix broken references · 395cf969

由 Paul Bolle 提交于 8月 15, 2011

There are numerous broken references to Documentation files (in other
Documentation files, in comments, etc.). These broken references are
caused by typo's in the references, and by renames or removals of the
Documentation files. Some broken references are simply odd.

Fix these broken references, sometimes by dropping the irrelevant text
they were part of.
Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

395cf969

26 9月, 2011 7 次提交

KVM: PPC: Implement H_CEDE hcall for book3s_hv in real-mode code · 19ccb76a

由 Paul Mackerras 提交于 7月 23, 2011

With a KVM guest operating in SMT4 mode (i.e. 4 hardware threads per
core), whenever a CPU goes idle, we have to pull all the other
hardware threads in the core out of the guest, because the H_CEDE
hcall is handled in the kernel. This is inefficient.

This adds code to book3s_hv_rmhandlers.S to handle the H_CEDE hcall
in real mode. When a guest vcpu does an H_CEDE hcall, we now only
exit to the kernel if all the other vcpus in the same core are also
idle. Otherwise we mark this vcpu as napping, save state that could
be lost in nap mode (mainly GPRs and FPRs), and execute the nap
instruction. When the thread wakes up, because of a decrementer or
external interrupt, we come back in at kvm_start_guest (from the
system reset interrupt vector), find the `napping' flag set in the
paca, and go to the resume path.

This has some other ramifications. First, when starting a core, we
now start all the threads, both those that are immediately runnable and
those that are idle. This is so that we don't have to pull all the
threads out of the guest when an idle thread gets a decrementer interrupt
and wants to start running. In fact the idle threads will all start
with the H_CEDE hcall returning; being idle they will just do another
H_CEDE immediately and go to nap mode.

This required some changes to kvmppc_run_core() and kvmppc_run_vcpu().
These functions have been restructured to make them simpler and clearer.
We introduce a level of indirection in the wait queue that gets woken
when external and decrementer interrupts get generated for a vcpu, so
that we can have the 4 vcpus in a vcore using the same wait queue.
We need this because the 4 vcpus are being handled by one thread.

Secondly, when we need to exit from the guest to the kernel, we now
have to generate an IPI for any napping threads, because an HDEC
interrupt doesn't wake up a napping thread.

Thirdly, we now need to be able to handle virtual external interrupts
and decrementer interrupts becoming pending while a thread is napping,
and deliver those interrupts to the guest when the thread wakes.
This is done in kvmppc_cede_reentry, just before fast_guest_return.

Finally, since we are not using the generic kvm_vcpu_block for book3s_hv,
and hence not calling kvm_arch_vcpu_runnable, we can remove the #ifdef
from kvm_arch_vcpu_runnable.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

19ccb76a

KVM: PPC: book3s_pr: Simplify transitions between virtual and real mode · 02143947

由 Paul Mackerras 提交于 7月 23, 2011

This simplifies the way that the book3s_pr makes the transition to
real mode when entering the guest.  We now call kvmppc_entry_trampoline
(renamed from kvmppc_rmcall) in the base kernel using a normal function
call instead of doing an indirect call through a pointer in the vcpu.
If kvm is a module, the module loader takes care of generating a
trampoline as it does for other calls to functions outside the module.

kvmppc_entry_trampoline then disables interrupts and jumps to
kvmppc_handler_trampoline_enter in real mode using an rfi[d].
That then uses the link register as the address to return to
(potentially in module space) when the guest exits.

This also simplifies the way that we call the Linux interrupt handler
when we exit the guest due to an external, decrementer or performance
monitor interrupt.  Instead of turning on the MMU, then deciding that
we need to call the Linux handler and turning the MMU back off again,
we now go straight to the handler at the point where we would turn the
MMU on.  The handler will then return to the virtual-mode code
(potentially in the module).

Along the way, this moves the setting and clearing of the HID5 DCBZ32
bit into real-mode interrupts-off code, and also makes sure that
we clear the MSR[RI] bit before loading values into SRR0/1.

The net result is that we no longer need any code addresses to be
stored in vcpu->arch.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAlexander Graf <agraf@suse.de>

02143947

KVM: PPC: Add sanity checking to vcpu_run · af8f38b3

由 Alexander Graf 提交于 8月 10, 2011

There are multiple features in PowerPC KVM that can now be enabled
depending on the user's wishes. Some of the combinations don't make
sense or don't work though.

So this patch adds a way to check if the executing environment would
actually be able to run the guest properly. It also adds sanity
checks if PVR is set (should always be true given the current code
flow), if PAPR is only used with book3s_64 where it works and that
HV KVM is only used in PAPR mode.
Signed-off-by: NAlexander Graf <agraf@suse.de>

af8f38b3

KVM: PPC: Add PAPR hypercall code for PR mode · 0254f074

由 Alexander Graf 提交于 8月 08, 2011

When running a PAPR guest, we need to handle a few hypercalls in kernel space,
most prominently the page table invalidation (to sync the shadows).

So this patch adds handling for a few PAPR hypercalls to PR mode KVM. I tried
to share the code with HV mode, but it ended up being a lot easier this way
around, as the two differ too much in those details.
Signed-off-by: NAlexander Graf <agraf@suse.de>

---

v1 -> v2:

  - whitespace fix

0254f074

KVM: PPC: Add support for explicit HIOR setting · a15bd354

由 Alexander Graf 提交于 8月 08, 2011

Until now, we always set HIOR based on the PVR, but this is just wrong.
Instead, we should be setting HIOR explicitly, so user space can decide
what the initial HIOR value is - just like on real hardware.

We keep the old PVR based way around for backwards compatibility, but
once user space uses the SREGS based method, we drop the PVR logic.
Signed-off-by: NAlexander Graf <agraf@suse.de>

a15bd354

KVM: PPC: Add papr_enabled flag · 9432ba60

由 Alexander Graf 提交于 8月 08, 2011

When running a PAPR guest, some things change. The privilege level drops
from hypervisor to supervisor, SDR1 gets treated differently and we interpret
hypercalls. For bisectability sake, add the flag now, but only enable it when
all the support code is there.
Signed-off-by: NAlexander Graf <agraf@suse.de>

9432ba60

KVM: PPC: move compute_tlbie_rb to book3s common header · db507c30

由 Alexander Graf 提交于 7月 08, 2011

We need the compute_tlbie_rb in _pr and _hv implementations for papr
soon, so let's move it over to a common header file that both
implementations can leverage.
Signed-off-by: NAlexander Graf <agraf@suse.de>

db507c30

20 9月, 2011 8 次提交

powerpc/powernv: Machine check and other system interrupts · ed79ba9e

由 Benjamin Herrenschmidt 提交于 9月 19, 2011

OPAL can handle various interrupt for us such as Machine Checks (it
performs all sorts of recovery tasks and passes back control to us with
informations about the error), Hardware Management Interrupts and Softpatch
interrupts.

This wires up the mechanisms and prints out specific informations returned
by HAL when a machine check occurs.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ed79ba9e

powerpc/powernv: Add OPAL ICS backend · 5c7c1e94

由 Benjamin Herrenschmidt 提交于 9月 19, 2011

OPAL handles HW access to the various ICS or equivalent chips
for us (with the exception of p5ioc2 based HEA which uses a

different backend) similarily to what RTAS does on pSeries.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

5c7c1e94

powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks · 628daa8d

由 Benjamin Herrenschmidt 提交于 9月 19, 2011

Implements OPAL RTC and NVRAM support and wire all that up to
the powernv platform.

We use RTAS for RTC as a fallback if available. Using RTAS for nvram
is not supported yet, pending some rework/cleanup and generalization
of the pSeries & CHRP code. We also use RTAS fallbacks for power off
and reboot
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

628daa8d

powerpc/powernv: Support for OPAL console · daea1175

由 Benjamin Herrenschmidt 提交于 9月 19, 2011

This adds a udbg and an hvc console backend for supporting a console
using the OPAL console interfaces.

On OPAL v1 we have hvc0 mapped to whatever console the system was
configured for (network or hvsi serial port) via the service
processor.

On OPAL v2 we have hvcN mapped to the Nth console provided by OPAL
which generally corresponds to:

	hvc0 : network console (raw protocol)
	hvc1 : serial port S1 (hvsi)
	hvc2 : serial port S2 (hvsi)

Note: At this point, early debug console only works with OPAL v1
and shouldn't be enabled in a normal kernel.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

daea1175

powerpc/powernv: Basic support for OPAL · 14a43e69

由 Benjamin Herrenschmidt 提交于 9月 19, 2011

Add definition of OPAL interfaces along with  the wrappers to call
into OPAL runtime and the early device-tree parsing hook to locate
the OPAL runtime firmware.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

14a43e69

powerpc/powernv: Add OPAL takeover from PowerVM · 27f44888

由 Benjamin Herrenschmidt 提交于 9月 19, 2011

On machines supporting the OPAL firmware version 1, the system
is initially booted under pHyp. We then use a special hypercall
to verify if OPAL is available and if it is, we then trigger
a "takeover" which disables pHyp and loads the OPAL runtime
firmware, giving control to the kernel in hypervisor mode.

This patch add the necessary code to detect that the OPAL takeover
capability is present when running under PowerVM (aka pHyp) and
perform said takeover to get hypervisor control of the processor.

To perform the takeover, we must first use RTAS (within Open
Firmware runtime environment) to start all processors & threads,
in order to give control to OPAL on all of them. We then call
the takeover hypercall on everybody, OPAL will re-enter the kernel
main entry point passing it a flat device-tree.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

27f44888

powerpc/smp: More generic support for "soft hotplug" · fb82b839

由 Benjamin Herrenschmidt 提交于 9月 19, 2011

This adds more generic support for doing CPU hotplug with a simple
idle loop and no actual reset of the processors. The generic
smp_generic_kick_cpu() does the hotplug bringup trick if the PACA
shows that the CPU has already been started at boot and we provide
an accessor for the CPU state.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fb82b839

powerpc: Fix oops when echoing bad values to /sys/devices/system/memory/probe · a1194097

由 Anton Blanchard 提交于 8月 10, 2011

If we echo an address the hypervisor doesn't like to
/sys/devices/system/memory/probe we oops the box:

# echo 0x10000000000 > /sys/devices/system/memory/probe

kernel BUG at arch/powerpc/mm/hash_utils_64.c:541!

The backtrace is:

create_section_mapping
arch_add_memory
add_memory
memory_probe_store
sysdev_class_store
sysfs_write_file
vfs_write
SyS_write

In create_section_mapping we BUG if htab_bolt_mapping returned
an error. A better approach is to return an error which will
propagate back to userspace.

Rerunning the test with this patch applied:

# echo 0x10000000000 > /sys/devices/system/memory/probe
-bash: echo: write error: Invalid argument
Signed-off-by: NAnton Blanchard <anton@samba.org>
Cc: stable@kernel.org
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a1194097

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功