1. 12 7月, 2011 4 次提交
    • P
      powerpc, KVM: Split HVMODE_206 cpu feature bit into separate HV and architecture bits · 969391c5
      Paul Mackerras 提交于
      This replaces the single CPU_FTR_HVMODE_206 bit with two bits, one to
      indicate that we have a usable hypervisor mode, and another to indicate
      that the processor conforms to PowerISA version 2.06.  We also add
      another bit to indicate that the processor conforms to ISA version 2.01
      and set that for PPC970 and derivatives.
      
      Some PPC970 chips (specifically those in Apple machines) have a
      hypervisor mode in that MSR[HV] is always 1, but the hypervisor mode
      is not useful in the sense that there is no way to run any code in
      supervisor mode (HV=0 PR=0).  On these processors, the LPES0 and LPES1
      bits in HID4 are always 0, and we use that as a way of detecting that
      hypervisor mode is not useful.
      
      Where we have a feature section in assembly code around code that
      only applies on POWER7 in hypervisor mode, we use a construct like
      
      END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
      
      The definition of END_FTR_SECTION_IFSET is such that the code will
      be enabled (not overwritten with nops) only if all bits in the
      provided mask are set.
      
      Note that the CPU feature check in __tlbie() only needs to check the
      ARCH_206 bit, not the HVMODE bit, because __tlbie() can only get called
      if we are running bare-metal, i.e. in hypervisor mode.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      969391c5
    • P
      KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests · aa04b4cc
      Paul Mackerras 提交于
      This adds infrastructure which will be needed to allow book3s_hv KVM to
      run on older POWER processors, including PPC970, which don't support
      the Virtual Real Mode Area (VRMA) facility, but only the Real Mode
      Offset (RMO) facility.  These processors require a physically
      contiguous, aligned area of memory for each guest.  When the guest does
      an access in real mode (MMU off), the address is compared against a
      limit value, and if it is lower, the address is ORed with an offset
      value (from the Real Mode Offset Register (RMOR)) and the result becomes
      the real address for the access.  The size of the RMA has to be one of
      a set of supported values, which usually includes 64MB, 128MB, 256MB
      and some larger powers of 2.
      
      Since we are unlikely to be able to allocate 64MB or more of physically
      contiguous memory after the kernel has been running for a while, we
      allocate a pool of RMAs at boot time using the bootmem allocator.  The
      size and number of the RMAs can be set using the kvm_rma_size=xx and
      kvm_rma_count=xx kernel command line options.
      
      KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability
      of the pool of preallocated RMAs.  The capability value is 1 if the
      processor can use an RMA but doesn't require one (because it supports
      the VRMA facility), or 2 if the processor requires an RMA for each guest.
      
      This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the
      pool and returns a file descriptor which can be used to map the RMA.  It
      also returns the size of the RMA in the argument structure.
      
      Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION
      ioctl calls from userspace.  To cope with this, we now preallocate the
      kvm->arch.ram_pginfo array when the VM is created with a size sufficient
      for up to 64GB of guest memory.  Subsequently we will get rid of this
      array and use memory associated with each memslot instead.
      
      This moves most of the code that translates the user addresses into
      host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level
      to kvmppc_core_prepare_memory_region.  Also, instead of having to look
      up the VMA for each page in order to check the page size, we now check
      that the pages we get are compound pages of 16MB.  However, if we are
      adding memory that is mapped to an RMA, we don't bother with calling
      get_user_pages_fast and instead just offset from the base pfn for the
      RMA.
      
      Typically the RMA gets added after vcpus are created, which makes it
      inconvenient to have the LPCR (logical partition control register) value
      in the vcpu->arch struct, since the LPCR controls whether the processor
      uses RMA or VRMA for the guest.  This moves the LPCR value into the
      kvm->arch struct and arranges for the MER (mediated external request)
      bit, which is the only bit that varies between vcpus, to be set in
      assembly code when going into the guest if there is a pending external
      interrupt request.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      aa04b4cc
    • P
      KVM: PPC: Add support for Book3S processors in hypervisor mode · de56a948
      Paul Mackerras 提交于
      This adds support for KVM running on 64-bit Book 3S processors,
      specifically POWER7, in hypervisor mode.  Using hypervisor mode means
      that the guest can use the processor's supervisor mode.  That means
      that the guest can execute privileged instructions and access privileged
      registers itself without trapping to the host.  This gives excellent
      performance, but does mean that KVM cannot emulate a processor
      architecture other than the one that the hardware implements.
      
      This code assumes that the guest is running paravirtualized using the
      PAPR (Power Architecture Platform Requirements) interface, which is the
      interface that IBM's PowerVM hypervisor uses.  That means that existing
      Linux distributions that run on IBM pSeries machines will also run
      under KVM without modification.  In order to communicate the PAPR
      hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
      to include/linux/kvm.h.
      
      Currently the choice between book3s_hv support and book3s_pr support
      (i.e. the existing code, which runs the guest in user mode) has to be
      made at kernel configuration time, so a given kernel binary can only
      do one or the other.
      
      This new book3s_hv code doesn't support MMIO emulation at present.
      Since we are running paravirtualized guests, this isn't a serious
      restriction.
      
      With the guest running in supervisor mode, most exceptions go straight
      to the guest.  We will never get data or instruction storage or segment
      interrupts, alignment interrupts, decrementer interrupts, program
      interrupts, single-step interrupts, etc., coming to the hypervisor from
      the guest.  Therefore this introduces a new KVMTEST_NONHV macro for the
      exception entry path so that we don't have to do the KVM test on entry
      to those exception handlers.
      
      We do however get hypervisor decrementer, hypervisor data storage,
      hypervisor instruction storage, and hypervisor emulation assist
      interrupts, so we have to handle those.
      
      In hypervisor mode, real-mode accesses can access all of RAM, not just
      a limited amount.  Therefore we put all the guest state in the vcpu.arch
      and use the shadow_vcpu in the PACA only for temporary scratch space.
      We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
      anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
      We don't have a shared page with the guest, but we still need a
      kvm_vcpu_arch_shared struct to store the values of various registers,
      so we include one in the vcpu_arch struct.
      
      The POWER7 processor has a restriction that all threads in a core have
      to be in the same partition.  MMU-on kernel code counts as a partition
      (partition 0), so we have to do a partition switch on every entry to and
      exit from the guest.  At present we require the host and guest to run
      in single-thread mode because of this hardware restriction.
      
      This code allocates a hashed page table for the guest and initializes
      it with HPTEs for the guest's Virtual Real Memory Area (VRMA).  We
      require that the guest memory is allocated using 16MB huge pages, in
      order to simplify the low-level memory management.  This also means that
      we can get away without tracking paging activity in the host for now,
      since huge pages can't be paged or swapped.
      
      This also adds a few new exports needed by the book3s_hv code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      de56a948
    • P
      powerpc: Set up LPCR for running guest partitions · 923c53ca
      Paul Mackerras 提交于
      In hypervisor mode, the LPCR controls several aspects of guest
      partitions, including virtual partition memory mode, and also controls
      whether the hypervisor decrementer interrupts are enabled.  This sets
      up LPCR at boot time so that guest partitions will use a virtual real
      memory area (VRMA) composed of 16MB large pages, and hypervisor
      decrementer interrupts are disabled.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      923c53ca
  2. 20 5月, 2011 1 次提交
  3. 04 5月, 2011 2 次提交
    • P
      powerpc: Save Come-From Address Register (CFAR) in exception frame · 48404f2e
      Paul Mackerras 提交于
      Recent 64-bit server processors (POWER6 and POWER7) have a "Come-From
      Address Register" (CFAR), that records the address of the most recent
      branch or rfid (return from interrupt) instruction for debugging purposes.
      
      This saves the value of the CFAR in the exception entry code and stores
      it in the exception frame.  We also make xmon print the CFAR value in
      its register dump code.
      
      Rather than extend the pt_regs struct at this time, we steal the orig_gpr3
      field, which is only used for system calls, and use it for the CFAR value
      for all exceptions/interrupts other than system calls.  This means we
      don't save the CFAR on system calls, which is not a great problem since
      system calls tend not to happen unexpectedly, and also avoids adding the
      overhead of reading the CFAR to the system call entry path.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      48404f2e
    • T
      powerpc: Add Initiate Coprocessor Store Word (icswx) support · 851d2e2f
      Tseng-Hui (Frank) Lin 提交于
      Icswx is a PowerPC instruction to send data to a co-processor. On Book-S
      processors the LPAR_ID and process ID (PID) of the owning process are
      registered in the window context of the co-processor at initialization
      time. When the icswx instruction is executed the L2 generates a cop-reg
      transaction on PowerBus. The transaction has no address and the
      processor does not perform an MMU access to authenticate the transaction.
      The co-processor compares the LPAR_ID and the PID included in the
      transaction and the LPAR_ID and PID held in the window context to
      determine if the process is authorized to generate the transaction.
      
      The OS needs to assign a 16-bit PID for the process. This cop-PID needs
      to be updated during context switch. The cop-PID needs to be destroyed
      when the context is destroyed.
      Signed-off-by: NSonny Rao <sonnyrao@linux.vnet.ibm.com>
      Signed-off-by: NTseng-Hui (Frank) Lin <thlin@linux.vnet.ibm.com>
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      851d2e2f
  4. 27 4月, 2011 1 次提交
  5. 20 4月, 2011 3 次提交
  6. 16 3月, 2011 1 次提交
  7. 15 3月, 2011 1 次提交
    • L
      powerpc/85xx: Workaroudn e500 CPU erratum A005 · ac6f1203
      Liu Yu 提交于
      This erratum can occur if a single-precision floating-point,
      double-precision floating-point or vector floating-point instruction on a
      mispredicted branch path signals one of the floating-point data interrupts
      which are enabled by the SPEFSCR (FINVE, FDBZE, FUNFE or FOVFE bits).  This
      interrupt must be recorded in a one-cycle window when the misprediction is
      resolved.  If this extremely rare event should occur, the result could be:
      
      The SPE Data Exception from the mispredicted path may be reported
      erroneously if a single-precision floating-point, double-precision
      floating-point or vector floating-point instruction is the second
      instruction on the correct branch path.
      
      According to errata description, some efp instructions which are not
      supposed to trigger SPE exceptions can trigger the exceptions in this case.
      However, as we haven't emulated these instructions here, a signal will
      send to userspace, and userspace application would exit.
      
      This patch re-issue the efp instruction that we haven't emulated,
      so that hardware can properly execute it again if this case happen.
      Signed-off-by: NLiu Yu <yu.liu@freescale.com>
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      ac6f1203
  8. 04 3月, 2011 1 次提交
  9. 13 1月, 2011 1 次提交
  10. 24 8月, 2010 1 次提交
  11. 09 7月, 2010 1 次提交
  12. 05 5月, 2010 2 次提交
  13. 25 4月, 2010 1 次提交
  14. 01 3月, 2010 1 次提交
  15. 28 10月, 2009 1 次提交
  16. 20 8月, 2009 5 次提交
  17. 16 6月, 2009 1 次提交
    • B
      powerpc: Add memory clobber to mtspr() · 2fae0a52
      Benjamin Herrenschmidt 提交于
      Without this clobber, mtspr can be re-ordered by gcc vs. surrounding
      memory accesses. While this might be ok for some cases, it's not in
      others and I'm not confident that all callers get it right (In fact
      I'm sure some of them don't).
      
      So for now, let's make mtspr() itself contain a memory clobber until
      we can audit and fix everything, at which point we can remove it
      if we think it's worth doing so.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2fae0a52
  18. 15 6月, 2009 1 次提交
    • P
      powerpc: Add compiler memory barrier to mtmsr macro · 4c75f84f
      Paul Mackerras 提交于
      On 32-bit non-Book E, local_irq_restore() turns into just mtmsr(),
      which doesn't currently have a compiler memory barrier.  This means
      that accesses to memory inside a local_irq_save/restore section,
      or a spin_lock_irqsave/spin_unlock_irqrestore section on UP, can
      be reordered by the compiler to occur outside that section.
      
      To fix this, this adds a compiler memory barrier to mtmsr for both
      32-bit and 64-bit.  Having a compiler memory barrier in mtmsr makes
      sense because it will almost always be changing something about the
      context in which memory accesses are done, so in general we don't want
      memory accesses getting moved from one side of an mtmsr to the other.
      
      With the barrier in mtmsr(), some of the explicit barriers in
      hw_irq.h are now redundant, so this removes them.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4c75f84f
  19. 15 5月, 2009 1 次提交
    • P
      perf_counter: powerpc: supply more precise information on counter overflow events · 0bbd0d4b
      Paul Mackerras 提交于
      This uses values from the MMCRA, SIAR and SDAR registers on
      powerpc to supply more precise information for overflow events,
      including a data address when PERF_RECORD_ADDR is specified.
      
      Since POWER6 uses different bit positions in MMCRA from earlier
      processors, this converts the struct power_pmu limited_pmc5_6
      field, which only had 0/1 values, into a flags field and
      defines bit values for its previous use (PPMU_LIMITED_PMC5_6)
      and a new flag (PPMU_ALT_SIPR) to indicate that the processor
      uses the POWER6 bit positions rather than the earlier
      positions.  It also adds definitions in reg.h for the new and
      old positions of the bit that indicates that the SIAR and SDAR
      values come from the same instruction.
      
      For the data address, the SDAR value is supplied if we are not
      doing instruction sampling.  In that case there is no guarantee
      that the address given in the PERF_RECORD_ADDR subrecord will
      correspond to the instruction whose address is given in the
      PERF_RECORD_IP subrecord.
      
      If instruction sampling is enabled (e.g. because this counter
      is counting a marked instruction event), then we only supply
      the SDAR value for the PERF_RECORD_ADDR subrecord if it
      corresponds to the instruction whose address is in the
      PERF_RECORD_IP subrecord.  Otherwise we supply 0.
      
      [ Impact: support more PMU hardware features on PowerPC ]
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <18955.37028.48861.555309@drongo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0bbd0d4b
  20. 02 4月, 2009 1 次提交
  21. 11 3月, 2009 1 次提交
  22. 23 12月, 2008 1 次提交
  23. 04 8月, 2008 1 次提交
  24. 17 7月, 2008 1 次提交
  25. 01 7月, 2008 2 次提交
    • M
      powerpc: Add VSX context save/restore, ptrace and signal support · ce48b210
      Michael Neuling 提交于
      This patch extends the floating point save and restore code to use the
      VSX load/stores when VSX is available.  This will make FP context
      save/restore marginally slower on FP only code, when VSX is available,
      as it has to load/store 128bits rather than just 64bits.
      
      Mixing FP, VMX and VSX code will get constant architected state.
      
      The signals interface is extended to enable access to VSR 0-31
      doubleword 1 after discussions with tool chain maintainers.  Backward
      compatibility is maintained.
      
      The ptrace interface is also extended to allow access to VSR 0-31 full
      registers.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      ce48b210
    • M
      powerpc: Introduce infrastructure for feature sections with alternatives · fac23fe4
      Michael Ellerman 提交于
      The current feature section logic only supports nop'ing out code, this means
      if you want to choose at runtime between instruction sequences, one or both
      cases will have to execute the nop'ed out contents of the other section, eg:
      
      BEGIN_FTR_SECTION
      	or	1,1,1
      END_FTR_SECTION_IFSET(FOO)
      BEGIN_FTR_SECTION
      	or	2,2,2
      END_FTR_SECTION_IFCLR(FOO)
      
      and the resulting code will be either,
      
      	or	1,1,1
      	nop
      
      or,
      	nop
      	or	2,2,2
      
      For small code segments this is fine, but for larger code blocks and in
      performance criticial code segments, it would be nice to avoid the nops.
      This commit starts to implement logic to allow the following:
      
      BEGIN_FTR_SECTION
      	or	1,1,1
      FTR_SECTION_ELSE
      	or	2,2,2
      ALT_FTR_SECTION_END_IFSET(FOO)
      
      and the resulting code will be:
      
      	or	1,1,1
      or,
      	or	2,2,2
      
      We achieve this by extending the existing FTR macros. The current feature
      section semantic just becomes a special case, ie. if the else case is empty
      we nop out the default case.
      
      The key limitation is that the size of the else case must be less than or
      equal to the size of the default case. If the else case is smaller the
      remainder of the section is nop'ed.
      
      We let the linker put the else case code in with the rest of the text,
      so that relative branches from the else case are more likley to link,
      this has the disadvantage that we can't free the unused else cases.
      
      This commit introduces the required macro and linker script changes, but
      does not enable the patching of the alternative sections.
      
      We also need to update two hand-made section entries in reg.h and timex.h
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      fac23fe4
  26. 26 6月, 2008 1 次提交
  27. 03 3月, 2008 1 次提交
  28. 06 2月, 2008 1 次提交