1. 07 8月, 2014 1 次提交
  2. 17 10月, 2013 3 次提交
  3. 08 7月, 2013 1 次提交
    • A
      powerpc/kvm: Contiguous memory allocator based hash page table allocation · fa61a4e3
      Aneesh Kumar K.V 提交于
      Powerpc architecture uses a hash based page table mechanism for mapping virtual
      addresses to physical address. The architecture require this hash page table to
      be physically contiguous. With KVM on Powerpc currently we use early reservation
      mechanism for allocating guest hash page table. This implies that we need to
      reserve a big memory region to ensure we can create large number of guest
      simultaneously with KVM on Power. Another disadvantage is that the reserved memory
      is not available to rest of the subsystems and and that implies we limit the total
      available memory in the host.
      
      This patch series switch the guest hash page table allocation to use
      contiguous memory allocator.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      fa61a4e3
  4. 19 5月, 2013 1 次提交
  5. 27 4月, 2013 5 次提交
  6. 25 1月, 2013 1 次提交
  7. 06 12月, 2012 2 次提交
    • P
      KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking · b4072df4
      Paul Mackerras 提交于
      Currently, if a machine check interrupt happens while we are in the
      guest, we exit the guest and call the host's machine check handler,
      which tends to cause the host to panic.  Some machine checks can be
      triggered by the guest; for example, if the guest creates two entries
      in the SLB that map the same effective address, and then accesses that
      effective address, the CPU will take a machine check interrupt.
      
      To handle this better, when a machine check happens inside the guest,
      we call a new function, kvmppc_realmode_machine_check(), while still in
      real mode before exiting the guest.  On POWER7, it handles the cases
      that the guest can trigger, either by flushing and reloading the SLB,
      or by flushing the TLB, and then it delivers the machine check interrupt
      directly to the guest without going back to the host.  On POWER7, the
      OPAL firmware patches the machine check interrupt vector so that it
      gets control first, and it leaves behind its analysis of the situation
      in a structure pointed to by the opal_mc_evt field of the paca.  The
      kvmppc_realmode_machine_check() function looks at this, and if OPAL
      reports that there was no error, or that it has handled the error, we
      also go straight back to the guest with a machine check.  We have to
      deliver a machine check to the guest since the machine check interrupt
      might have trashed valid values in SRR0/1.
      
      If the machine check is one we can't handle in real mode, and one that
      OPAL hasn't already handled, or on PPC970, we exit the guest and call
      the host's machine check handler.  We do this by jumping to the
      machine_check_fwnmi label, rather than absolute address 0x200, because
      we don't want to re-execute OPAL's handler on POWER7.  On PPC970, the
      two are equivalent because address 0x200 just contains a branch.
      
      Then, if the host machine check handler decides that the system can
      continue executing, kvmppc_handle_exit() delivers a machine check
      interrupt to the guest -- once again to let the guest know that SRR0/1
      have been modified.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix checkpatch warnings]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b4072df4
    • A
      KVM: PPC: Support eventfd · 0e673fb6
      Alexander Graf 提交于
      In order to support the generic eventfd infrastructure on PPC, we need
      to call into the generic KVM in-kernel device mmio code.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      0e673fb6
  8. 06 5月, 2012 1 次提交
  9. 08 4月, 2012 2 次提交
  10. 26 9月, 2011 2 次提交
    • P
      KVM: PPC: Assemble book3s{,_hv}_rmhandlers.S separately · 177339d7
      Paul Mackerras 提交于
      This makes arch/powerpc/kvm/book3s_rmhandlers.S and
      arch/powerpc/kvm/book3s_hv_rmhandlers.S be assembled as
      separate compilation units rather than having them #included in
      arch/powerpc/kernel/exceptions-64s.S.  We no longer have any
      conditional branches between the exception prologs in
      exceptions-64s.S and the KVM handlers, so there is no need to
      keep their contents close together in the vmlinux image.
      
      In their current location, they are using up part of the limited
      space between the first-level interrupt handlers and the firmware
      NMI data area at offset 0x7000, and with some kernel configurations
      this area will overflow (e.g. allyesconfig), leading to an
      "attempt to .org backwards" error when compiling exceptions-64s.S.
      
      Moving them out requires that we add some #includes that the
      book3s_{,hv_}rmhandlers.S code was previously getting implicitly
      via exceptions-64s.S.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      177339d7
    • A
      KVM: PPC: Add PAPR hypercall code for PR mode · 0254f074
      Alexander Graf 提交于
      When running a PAPR guest, we need to handle a few hypercalls in kernel space,
      most prominently the page table invalidation (to sync the shadows).
      
      So this patch adds handling for a few PAPR hypercalls to PR mode KVM. I tried
      to share the code with HV mode, but it ended up being a lot easier this way
      around, as the two differ too much in those details.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      
      ---
      
      v1 -> v2:
      
        - whitespace fix
      0254f074
  11. 12 7月, 2011 5 次提交
    • P
      KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests · aa04b4cc
      Paul Mackerras 提交于
      This adds infrastructure which will be needed to allow book3s_hv KVM to
      run on older POWER processors, including PPC970, which don't support
      the Virtual Real Mode Area (VRMA) facility, but only the Real Mode
      Offset (RMO) facility.  These processors require a physically
      contiguous, aligned area of memory for each guest.  When the guest does
      an access in real mode (MMU off), the address is compared against a
      limit value, and if it is lower, the address is ORed with an offset
      value (from the Real Mode Offset Register (RMOR)) and the result becomes
      the real address for the access.  The size of the RMA has to be one of
      a set of supported values, which usually includes 64MB, 128MB, 256MB
      and some larger powers of 2.
      
      Since we are unlikely to be able to allocate 64MB or more of physically
      contiguous memory after the kernel has been running for a while, we
      allocate a pool of RMAs at boot time using the bootmem allocator.  The
      size and number of the RMAs can be set using the kvm_rma_size=xx and
      kvm_rma_count=xx kernel command line options.
      
      KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability
      of the pool of preallocated RMAs.  The capability value is 1 if the
      processor can use an RMA but doesn't require one (because it supports
      the VRMA facility), or 2 if the processor requires an RMA for each guest.
      
      This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the
      pool and returns a file descriptor which can be used to map the RMA.  It
      also returns the size of the RMA in the argument structure.
      
      Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION
      ioctl calls from userspace.  To cope with this, we now preallocate the
      kvm->arch.ram_pginfo array when the VM is created with a size sufficient
      for up to 64GB of guest memory.  Subsequently we will get rid of this
      array and use memory associated with each memslot instead.
      
      This moves most of the code that translates the user addresses into
      host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level
      to kvmppc_core_prepare_memory_region.  Also, instead of having to look
      up the VMA for each page in order to check the page size, we now check
      that the pages we get are compound pages of 16MB.  However, if we are
      adding memory that is mapped to an RMA, we don't bother with calling
      get_user_pages_fast and instead just offset from the base pfn for the
      RMA.
      
      Typically the RMA gets added after vcpus are created, which makes it
      inconvenient to have the LPCR (logical partition control register) value
      in the vcpu->arch struct, since the LPCR controls whether the processor
      uses RMA or VRMA for the guest.  This moves the LPCR value into the
      kvm->arch struct and arranges for the MER (mediated external request)
      bit, which is the only bit that varies between vcpus, to be set in
      assembly code when going into the guest if there is a pending external
      interrupt request.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      aa04b4cc
    • D
      KVM: PPC: Accelerate H_PUT_TCE by implementing it in real mode · 54738c09
      David Gibson 提交于
      This improves I/O performance for guests using the PAPR
      paravirtualization interface by making the H_PUT_TCE hcall faster, by
      implementing it in real mode.  H_PUT_TCE is used for updating virtual
      IOMMU tables, and is used both for virtual I/O and for real I/O in the
      PAPR interface.
      
      Since this moves the IOMMU tables into the kernel, we define a new
      KVM_CREATE_SPAPR_TCE ioctl to allow qemu to create the tables.  The
      ioctl returns a file descriptor which can be used to mmap the newly
      created table.  The qemu driver models use them in the same way as
      userspace managed tables, but they can be updated directly by the
      guest with a real-mode H_PUT_TCE implementation, reducing the number
      of host/guest context switches during guest IO.
      
      There are certain circumstances where it is useful for userland qemu
      to write to the TCE table even if the kernel H_PUT_TCE path is used
      most of the time.  Specifically, allowing this will avoid awkwardness
      when we need to reset the table.  More importantly, we will in the
      future need to write the table in order to restore its state after a
      checkpoint resume or migration.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      54738c09
    • P
      KVM: PPC: Handle some PAPR hcalls in the kernel · a8606e20
      Paul Mackerras 提交于
      This adds the infrastructure for handling PAPR hcalls in the kernel,
      either early in the guest exit path while we are still in real mode,
      or later once the MMU has been turned back on and we are in the full
      kernel context.  The advantage of handling hcalls in real mode if
      possible is that we avoid two partition switches -- and this will
      become more important when we support SMT4 guests, since a partition
      switch means we have to pull all of the threads in the core out of
      the guest.  The disadvantage is that we can only access the kernel
      linear mapping, not anything vmalloced or ioremapped, since the MMU
      is off.
      
      This also adds code to handle the following hcalls in real mode:
      
      H_ENTER       Add an HPTE to the hashed page table
      H_REMOVE      Remove an HPTE from the hashed page table
      H_READ        Read HPTEs from the hashed page table
      H_PROTECT     Change the protection bits in an HPTE
      H_BULK_REMOVE Remove up to 4 HPTEs from the hashed page table
      H_SET_DABR    Set the data address breakpoint register
      
      Plus code to handle the following hcalls in the kernel:
      
      H_CEDE        Idle the vcpu until an interrupt or H_PROD hcall arrives
      H_PROD        Wake up a ceded vcpu
      H_REGISTER_VPA Register a virtual processor area (VPA)
      
      The code that runs in real mode has to be in the base kernel, not in
      the module, if KVM is compiled as a module.  The real-mode code can
      only access the kernel linear mapping, not vmalloc or ioremap space.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a8606e20
    • P
      KVM: PPC: Add support for Book3S processors in hypervisor mode · de56a948
      Paul Mackerras 提交于
      This adds support for KVM running on 64-bit Book 3S processors,
      specifically POWER7, in hypervisor mode.  Using hypervisor mode means
      that the guest can use the processor's supervisor mode.  That means
      that the guest can execute privileged instructions and access privileged
      registers itself without trapping to the host.  This gives excellent
      performance, but does mean that KVM cannot emulate a processor
      architecture other than the one that the hardware implements.
      
      This code assumes that the guest is running paravirtualized using the
      PAPR (Power Architecture Platform Requirements) interface, which is the
      interface that IBM's PowerVM hypervisor uses.  That means that existing
      Linux distributions that run on IBM pSeries machines will also run
      under KVM without modification.  In order to communicate the PAPR
      hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
      to include/linux/kvm.h.
      
      Currently the choice between book3s_hv support and book3s_pr support
      (i.e. the existing code, which runs the guest in user mode) has to be
      made at kernel configuration time, so a given kernel binary can only
      do one or the other.
      
      This new book3s_hv code doesn't support MMIO emulation at present.
      Since we are running paravirtualized guests, this isn't a serious
      restriction.
      
      With the guest running in supervisor mode, most exceptions go straight
      to the guest.  We will never get data or instruction storage or segment
      interrupts, alignment interrupts, decrementer interrupts, program
      interrupts, single-step interrupts, etc., coming to the hypervisor from
      the guest.  Therefore this introduces a new KVMTEST_NONHV macro for the
      exception entry path so that we don't have to do the KVM test on entry
      to those exception handlers.
      
      We do however get hypervisor decrementer, hypervisor data storage,
      hypervisor instruction storage, and hypervisor emulation assist
      interrupts, so we have to handle those.
      
      In hypervisor mode, real-mode accesses can access all of RAM, not just
      a limited amount.  Therefore we put all the guest state in the vcpu.arch
      and use the shadow_vcpu in the PACA only for temporary scratch space.
      We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
      anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
      We don't have a shared page with the guest, but we still need a
      kvm_vcpu_arch_shared struct to store the values of various registers,
      so we include one in the vcpu_arch struct.
      
      The POWER7 processor has a restriction that all threads in a core have
      to be in the same partition.  MMU-on kernel code counts as a partition
      (partition 0), so we have to do a partition switch on every entry to and
      exit from the guest.  At present we require the host and guest to run
      in single-thread mode because of this hardware restriction.
      
      This code allocates a hashed page table for the guest and initializes
      it with HPTEs for the guest's Virtual Real Memory Area (VRMA).  We
      require that the guest memory is allocated using 16MB huge pages, in
      order to simplify the low-level memory management.  This also means that
      we can get away without tracking paging activity in the host for now,
      since huge pages can't be paged or swapped.
      
      This also adds a few new exports needed by the book3s_hv code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      de56a948
    • P
      KVM: PPC: Split out code from book3s.c into book3s_pr.c · f05ed4d5
      Paul Mackerras 提交于
      In preparation for adding code to enable KVM to use hypervisor mode
      on 64-bit Book 3S processors, this splits book3s.c into two files,
      book3s.c and book3s_pr.c, where book3s_pr.c contains the code that is
      specific to running the guest in problem state (user mode) and book3s.c
      contains code which should apply to all Book 3S processors.
      
      In doing this, we abstract some details, namely the interrupt offset,
      updating the interrupt pending flag, and detecting if the guest is
      in a critical section.  These are all things that will be different
      when we use hypervisor mode.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      f05ed4d5
  12. 13 10月, 2010 1 次提交
  13. 01 8月, 2010 1 次提交
  14. 17 5月, 2010 3 次提交
  15. 25 4月, 2010 2 次提交
    • A
      KVM: PPC: Implement Paired Single emulation · 831317b6
      Alexander Graf 提交于
      The one big thing about the Gekko is paired singles.
      
      Paired singles are an extension to the instruction set, that adds 32 single
      precision floating point registers (qprs), some SPRs to modify the behavior
      of paired singled operations and instructions to deal with qprs to the
      instruction set.
      
      Unfortunately, it also changes semantics of existing operations that affect
      single values in FPRs. In most cases they get mirrored to the coresponding
      QPR.
      
      Thanks to that we need to emulate all FPU operations and all the new paired
      single operations too.
      
      In order to achieve that, we use the just introduced FPU call helpers to
      call the real FPU whenever the guest wants to modify an FPR. Additionally
      we also fix up the QPR values along the way.
      
      That way we can execute paired single FPU operations without implementing a
      soft fpu.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      831317b6
    • A
      KVM: PPC: Add helpers to call FPU instructions · 963cf3dc
      Alexander Graf 提交于
      To emulate paired single instructions, we need to be able to call FPU
      operations from within the kernel. Since we don't want gcc to spill
      arbitrary FPU code everywhere, we tell it to use a soft fpu.
      
      Since we know we can really call the FPU in safe areas, let's also add
      some calls that we can later use to actually execute real world FPU
      operations on the host's FPU.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      963cf3dc
  16. 05 11月, 2009 1 次提交
  17. 10 9月, 2009 2 次提交
  18. 16 6月, 2009 1 次提交
    • M
      powerpc: Add configurable -Werror for arch/powerpc · ba55bd74
      Michael Ellerman 提交于
      Add the option to build the code under arch/powerpc with -Werror.
      
      The intention is to make it harder for people to inadvertantly introduce
      warnings in the arch/powerpc code. It needs to be configurable so that
      if a warning is introduced, people can easily work around it while it's
      being fixed.
      
      The option is a negative, ie. don't enable -Werror, so that it will be
      turned on for allyes and allmodconfig builds.
      
      The default is n, in the hope that developers will build with -Werror,
      that will probably lead to some build breaks, I am prepared to be flamed.
      
      It's not enabled for math-emu, which is a steaming pile of warnings.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ba55bd74
  19. 24 3月, 2009 2 次提交
  20. 31 12月, 2008 3 次提交
    • H
      KVM: ppc: Implement in-kernel exit timing statistics · 73e75b41
      Hollis Blanchard 提交于
      Existing KVM statistics are either just counters (kvm_stat) reported for
      KVM generally or trace based aproaches like kvm_trace.
      For KVM on powerpc we had the need to track the timings of the different exit
      types. While this could be achieved parsing data created with a kvm_trace
      extension this adds too much overhead (at least on embedded PowerPC) slowing
      down the workloads we wanted to measure.
      
      Therefore this patch adds a in-kernel exit timing statistic to the powerpc kvm
      code. These statistic is available per vm&vcpu under the kvm debugfs directory.
      As this statistic is low, but still some overhead it can be enabled via a
      .config entry and should be off by default.
      
      Since this patch touched all powerpc kvm_stat code anyway this code is now
      merged and simplified together with the exit timing statistic code (still
      working with exit timing disabled in .config).
      Signed-off-by: NChristian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
      Signed-off-by: NHollis Blanchard <hollisb@us.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      73e75b41
    • H
      KVM: ppc: refactor instruction emulation into generic and core-specific pieces · 75f74f0d
      Hollis Blanchard 提交于
      Cores provide 3 emulation hooks, implemented for example in the new
      4xx_emulate.c:
      kvmppc_core_emulate_op
      kvmppc_core_emulate_mtspr
      kvmppc_core_emulate_mfspr
      
      Strictly speaking the last two aren't necessary, but provide for more
      informative error reporting ("unknown SPR").
      
      Long term I'd like to have instruction decoding autogenerated from tables of
      opcodes, and that way we could aggregate universal, Book E, and core-specific
      instructions more easily and without redundant switch statements.
      Signed-off-by: NHollis Blanchard <hollisb@us.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      75f74f0d
    • H
      KVM: ppc: Refactor powerpc.c to relocate 440-specific code · 9dd921cf
      Hollis Blanchard 提交于
      This introduces a set of core-provided hooks. For 440, some of these are
      implemented by booke.c, with the rest in (the new) 44x.c.
      
      Note that these hooks are link-time, not run-time. Since it is not possible to
      build a single kernel for both e500 and 440 (for example), using function
      pointers would only add overhead.
      Signed-off-by: NHollis Blanchard <hollisb@us.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      9dd921cf