1. 18 10月, 2013 1 次提交
  2. 17 10月, 2013 27 次提交
    • A
      kvm: Add struct kvm arg to memslot APIs · 5587027c
      Aneesh Kumar K.V 提交于
      We will use that in the later patch to find the kvm ops handler
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5587027c
    • A
      kvm: powerpc: book3s: Add is_hv_enabled to kvmppc_ops · 699cc876
      Aneesh Kumar K.V 提交于
      This help us to identify whether we are running with hypervisor mode KVM
      enabled. The change is needed so that we can have both HV and PR kvm
      enabled in the same kernel.
      
      If both HV and PR KVM are included, interrupts come in to the HV version
      of the kvmppc_interrupt code, which then jumps to the PR handler,
      renamed to kvmppc_interrupt_pr, if the guest is a PR guest.
      
      Allowing both PR and HV in the same kernel required some changes to
      kvm_dev_ioctl_check_extension(), since the values returned now can't
      be selected with #ifdefs as much as previously. We look at is_hv_enabled
      to return the right value when checking for capabilities.For capabilities that
      are only provided by HV KVM, we return the HV value only if
      is_hv_enabled is true. For capabilities provided by PR KVM but not HV,
      we return the PR value only if is_hv_enabled is false.
      
      NOTE: in later patch we replace is_hv_enabled with a static inline
      function comparing kvm_ppc_ops
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      699cc876
    • A
      kvm: powerpc: book3s: Cleanup interrupt handling code · dd96b2c2
      Aneesh Kumar K.V 提交于
      With this patch if HV is included, interrupts come in to the HV version
      of the kvmppc_interrupt code, which then jumps to the PR handler,
      renamed to kvmppc_interrupt_pr, if the guest is a PR guest. This helps
      in enabling both HV and PR, which we do in later patch
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      dd96b2c2
    • A
      kvm: powerpc: Add kvmppc_ops callback · 3a167bea
      Aneesh Kumar K.V 提交于
      This patch add a new callback kvmppc_ops. This will help us in enabling
      both HV and PR KVM together in the same kernel. The actual change to
      enable them together is done in the later patch in the series.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [agraf: squash in booke changes]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3a167bea
    • A
      kvm: powerpc: book3s: Add a new config variable CONFIG_KVM_BOOK3S_HV_POSSIBLE · 9975f5e3
      Aneesh Kumar K.V 提交于
      This help ups to select the relevant code in the kernel code
      when we later move HV and PR bits as seperate modules. The patch
      also makes the config options for PR KVM selectable
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9975f5e3
    • A
      kvm: powerpc: book3s: pr: Rename KVM_BOOK3S_PR to KVM_BOOK3S_PR_POSSIBLE · 7aa79938
      Aneesh Kumar K.V 提交于
      With later patches supporting PR kvm as a kernel module, the changes
      that has to be built into the main kernel binary to enable PR KVM module
      is now selected via KVM_BOOK3S_PR_POSSIBLE
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7aa79938
    • B
      KVM: PPC: E500: Add userspace debug stub support · ce11e48b
      Bharat Bhushan 提交于
      This patch adds the debug stub support on booke/bookehv.
      Now QEMU debug stub can use hw breakpoint, watchpoint and
      software breakpoint to debug guest.
      
      This is how we save/restore debug register context when switching
      between guest, userspace and kernel user-process:
      
      When QEMU is running
       -> thread->debug_reg == QEMU debug register context.
       -> Kernel will handle switching the debug register on context switch.
       -> no vcpu_load() called
      
      QEMU makes ioctls (except RUN)
       -> This will call vcpu_load()
       -> should not change context.
       -> Some ioctls can change vcpu debug register, context saved in vcpu->debug_regs
      
      QEMU Makes RUN ioctl
       -> Save thread->debug_reg on STACK
       -> Store thread->debug_reg == vcpu->debug_reg
       -> load thread->debug_reg
       -> RUN VCPU ( So thread points to vcpu context )
      
      Context switch happens When VCPU running
       -> makes vcpu_load() should not load any context
       -> kernel loads the vcpu context as thread->debug_regs points to vcpu context.
      
      On heavyweight_exit
       -> Load the context saved on stack in thread->debug_reg
      
      Currently we do not support debug resource emulation to guest,
      On debug exception, always exit to user space irrespective of
      user space is expecting the debug exception or not. If this is
      unexpected exception (breakpoint/watchpoint event not set by
      userspace) then let us leave the action on user space. This
      is similar to what it was before, only thing is that now we
      have proper exit state available to user space.
      Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ce11e48b
    • B
      KVM: PPC: E500: Using "struct debug_reg" · 547465ef
      Bharat Bhushan 提交于
      For KVM also use the "struct debug_reg" defined in asm/processor.h
      Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      547465ef
    • B
      KVM: PPC: E500: exit to user space on "ehpriv 1" instruction · b12c7841
      Bharat Bhushan 提交于
      "ehpriv 1" instruction is used for setting software breakpoints
      by user space. This patch adds support to exit to user space
      with "run->debug" have relevant information.
      
      As this is the first point we are using run->debug, also defined
      the run->debug structure.
      Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b12c7841
    • B
      powerpc: export debug registers save function for KVM · fc82cf11
      Bharat Bhushan 提交于
      KVM need this function when switching from vcpu to user-space
      thread. My subsequent patch will use this function.
      Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Acked-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      fc82cf11
    • B
      powerpc: move debug registers in a structure · 95791988
      Bharat Bhushan 提交于
      This way we can use same data type struct with KVM and
      also help in using other debug related function.
      Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      95791988
    • B
      powerpc: book3e: _PAGE_LENDIAN must be _PAGE_ENDIAN · c199efa2
      Bharat Bhushan 提交于
      For booke3e _PAGE_ENDIAN is not defined. Infact what is defined
      is "_PAGE_LENDIAN" which is wrong and that should be _PAGE_ENDIAN.
      There are no compilation errors as
      arch/powerpc/include/asm/pte-common.h defines _PAGE_ENDIAN to 0
      as it is not defined anywhere.
      Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c199efa2
    • P
      KVM: PPC: Book3S HV: Better handling of exceptions that happen in real mode · 44a3add8
      Paul Mackerras 提交于
      When an interrupt or exception happens in the guest that comes to the
      host, the CPU goes to hypervisor real mode (MMU off) to handle the
      exception but doesn't change the MMU context.  After saving a few
      registers, we then clear the "in guest" flag.  If, for any reason,
      we get an exception in the real-mode code, that then gets handled
      by the normal kernel exception handlers, which turn the MMU on.  This
      is disastrous if the MMU is still set to the guest context, since we
      end up executing instructions from random places in the guest kernel
      with hypervisor privilege.
      
      In order to catch this situation, we define a new value for the "in guest"
      flag, KVM_GUEST_MODE_HOST_HV, to indicate that we are in hypervisor real
      mode with guest MMU context.  If the "in guest" flag is set to this value,
      we branch off to an emergency handler.  For the moment, this just does
      a branch to self to stop the CPU from doing anything further.
      
      While we're here, we define another new flag value to indicate that we
      are in a HV guest, as distinct from a PR guest.  This will be useful
      when we have a kernel that can support both PR and HV guests concurrently.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      44a3add8
    • P
      KVM: PPC: Book3S PR: Use mmu_notifier_retry() in kvmppc_mmu_map_page() · d78bca72
      Paul Mackerras 提交于
      When the MM code is invalidating a range of pages, it calls the KVM
      kvm_mmu_notifier_invalidate_range_start() notifier function, which calls
      kvm_unmap_hva_range(), which arranges to flush all the existing host
      HPTEs for guest pages.  However, the Linux PTEs for the range being
      flushed are still valid at that point.  We are not supposed to establish
      any new references to pages in the range until the ...range_end()
      notifier gets called.  The PPC-specific KVM code doesn't get any
      explicit notification of that; instead, we are supposed to use
      mmu_notifier_retry() to test whether we are or have been inside a
      range flush notifier pair while we have been getting a page and
      instantiating a host HPTE for the page.
      
      This therefore adds a call to mmu_notifier_retry inside
      kvmppc_mmu_map_page().  This call is inside a region locked with
      kvm->mmu_lock, which is the same lock that is called by the KVM
      MMU notifier functions, thus ensuring that no new notification can
      proceed while we are in the locked region.  Inside this region we
      also create the host HPTE and link the corresponding hpte_cache
      structure into the lists used to find it later.  We cannot allocate
      the hpte_cache structure inside this locked region because that can
      lead to deadlock, so we allocate it outside the region and free it
      if we end up not using it.
      
      This also moves the updates of vcpu3s->hpte_cache_count inside the
      regions locked with vcpu3s->mmu_lock, and does the increment in
      kvmppc_mmu_hpte_cache_map() when the pte is added to the cache
      rather than when it is allocated, in order that the hpte_cache_count
      is accurate.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      d78bca72
    • P
      KVM: PPC: Book3S PR: Better handling of host-side read-only pages · 93b159b4
      Paul Mackerras 提交于
      Currently we request write access to all pages that get mapped into the
      guest, even if the guest is only loading from the page.  This reduces
      the effectiveness of KSM because it means that we unshare every page we
      access.  Also, we always set the changed (C) bit in the guest HPTE if
      it allows writing, even for a guest load.
      
      This fixes both these problems.  We pass an 'iswrite' flag to the
      mmu.xlate() functions and to kvmppc_mmu_map_page() to indicate whether
      the access is a load or a store.  The mmu.xlate() functions now only
      set C for stores.  kvmppc_gfn_to_pfn() now calls gfn_to_pfn_prot()
      instead of gfn_to_pfn() so that it can indicate whether we need write
      access to the page, and get back a 'writable' flag to indicate whether
      the page is writable or not.  If that 'writable' flag is clear, we then
      make the host HPTE read-only even if the guest HPTE allowed writing.
      
      This means that we can get a protection fault when the guest writes to a
      page that it has mapped read-write but which is read-only on the host
      side (perhaps due to KSM having merged the page).  Thus we now call
      kvmppc_handle_pagefault() for protection faults as well as HPTE not found
      faults.  In kvmppc_handle_pagefault(), if the access was allowed by the
      guest HPTE and we thus need to install a new host HPTE, we then need to
      remove the old host HPTE if there is one.  This is done with a new
      function, kvmppc_mmu_unmap_page(), which uses kvmppc_mmu_pte_vflush() to
      find and remove the old host HPTE.
      
      Since the memslot-related functions require the KVM SRCU read lock to
      be held, this adds srcu_read_lock/unlock pairs around the calls to
      kvmppc_handle_pagefault().
      
      Finally, this changes kvmppc_mmu_book3s_32_xlate_pte() to not ignore
      guest HPTEs that don't permit access, and to return -EPERM for accesses
      that are not permitted by the page protections.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      93b159b4
    • P
      KVM: PPC: Book3S PR: Allocate kvm_vcpu structs from kvm_vcpu_cache · 3ff95502
      Paul Mackerras 提交于
      This makes PR KVM allocate its kvm_vcpu structs from the kvm_vcpu_cache
      rather than having them embedded in the kvmppc_vcpu_book3s struct,
      which is allocated with vzalloc.  The reason is to reduce the
      differences between PR and HV KVM in order to make is easier to have
      them coexist in one kernel binary.
      
      With this, the kvm_vcpu struct has a pointer to the kvmppc_vcpu_book3s
      struct.  The pointer to the kvmppc_book3s_shadow_vcpu struct has moved
      from the kvmppc_vcpu_book3s struct to the kvm_vcpu struct, and is only
      present for 32-bit, since it is only used for 32-bit.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: squash in compile fix from Aneesh]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3ff95502
    • P
      KVM: PPC: Book3S PR: Make HPT accesses and updates SMP-safe · 9308ab8e
      Paul Mackerras 提交于
      This adds a per-VM mutex to provide mutual exclusion between vcpus
      for accesses to and updates of the guest hashed page table (HPT).
      This also makes the code use single-byte writes to the HPT entry
      when updating of the reference (R) and change (C) bits.  The reason
      for doing this, rather than writing back the whole HPTE, is that on
      non-PAPR virtual machines, the guest OS might be writing to the HPTE
      concurrently, and writing back the whole HPTE might conflict with
      that.  Also, real hardware does single-byte writes to update R and C.
      
      The new mutex is taken in kvmppc_mmu_book3s_64_xlate() when reading
      the HPT and updating R and/or C, and in the PAPR HPT update hcalls
      (H_ENTER, H_REMOVE, etc.).  Having the mutex means that we don't need
      to use a hypervisor lock bit in the HPT update hcalls, and we don't
      need to be careful about the order in which the bytes of the HPTE are
      updated by those hcalls.
      
      The other change here is to make emulated TLB invalidations (tlbie)
      effective across all vcpus.  To do this we call kvmppc_mmu_pte_vflush
      for all vcpus in kvmppc_ppc_book3s_64_tlbie().
      
      For 32-bit, this makes the setting of the accessed and dirty bits use
      single-byte writes, and makes tlbie invalidate shadow HPTEs for all
      vcpus.
      
      With this, PR KVM can successfully run SMP guests.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9308ab8e
    • P
      KVM: PPC: Book3S PR: Use 64k host pages where possible · c9029c34
      Paul Mackerras 提交于
      Currently, PR KVM uses 4k pages for the host-side mappings of guest
      memory, regardless of the host page size.  When the host page size is
      64kB, we might as well use 64k host page mappings for guest mappings
      of 64kB and larger pages and for guest real-mode mappings.  However,
      the magic page has to remain a 4k page.
      
      To implement this, we first add another flag bit to the guest VSID
      values we use, to indicate that this segment is one where host pages
      should be mapped using 64k pages.  For segments with this bit set
      we set the bits in the shadow SLB entry to indicate a 64k base page
      size.  When faulting in host HPTEs for this segment, we make them
      64k HPTEs instead of 4k.  We record the pagesize in struct hpte_cache
      for use when invalidating the HPTE.
      
      For now we restrict the segment containing the magic page (if any) to
      4k pages.  It should be possible to lift this restriction in future
      by ensuring that the magic 4k page is appropriately positioned within
      a host 64k page.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c9029c34
    • P
      KVM: PPC: Book3S PR: Allow guest to use 64k pages · a4a0f252
      Paul Mackerras 提交于
      This adds the code to interpret 64k HPTEs in the guest hashed page
      table (HPT), 64k SLB entries, and to tell the guest about 64k pages
      in kvm_vm_ioctl_get_smmu_info().  Guest 64k pages are still shadowed
      by 4k pages.
      
      This also adds another hash table to the four we have already in
      book3s_mmu_hpte.c to allow us to find all the PTEs that we have
      instantiated that match a given 64k guest page.
      
      The tlbie instruction changed starting with POWER6 to use a bit in
      the RB operand to indicate large page invalidations, and to use other
      RB bits to indicate the base and actual page sizes and the segment
      size.  64k pages came in slightly earlier, with POWER5++.
      We use one bit in vcpu->arch.hflags to indicate that the emulated
      cpu supports 64k pages, and another to indicate that it has the new
      tlbie definition.
      
      The KVM_PPC_GET_SMMU_INFO ioctl presents a bit of a problem, because
      the MMU capabilities depend on which CPU model we're emulating, but it
      is a VM ioctl not a VCPU ioctl and therefore doesn't get passed a VCPU
      fd.  In addition, commonly-used userspace (QEMU) calls it before
      setting the PVR for any VCPU.  Therefore, as a best effort we look at
      the first vcpu in the VM and return 64k pages or not depending on its
      capabilities.  We also make the PVR default to the host PVR on recent
      CPUs that support 1TB segments (and therefore multiple page sizes as
      well) so that KVM_PPC_GET_SMMU_INFO will include 64k page and 1TB
      segment support on those CPUs.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a4a0f252
    • P
      KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu · a2d56020
      Paul Mackerras 提交于
      Currently PR-style KVM keeps the volatile guest register values
      (R0 - R13, CR, LR, CTR, XER, PC) in a shadow_vcpu struct rather than
      the main kvm_vcpu struct.  For 64-bit, the shadow_vcpu exists in two
      places, a kmalloc'd struct and in the PACA, and it gets copied back
      and forth in kvmppc_core_vcpu_load/put(), because the real-mode code
      can't rely on being able to access the kmalloc'd struct.
      
      This changes the code to copy the volatile values into the shadow_vcpu
      as one of the last things done before entering the guest.  Similarly
      the values are copied back out of the shadow_vcpu to the kvm_vcpu
      immediately after exiting the guest.  We arrange for interrupts to be
      still disabled at this point so that we can't get preempted on 64-bit
      and end up copying values from the wrong PACA.
      
      This means that the accessor functions in kvm_book3s.h for these
      registers are greatly simplified, and are same between PR and HV KVM.
      In places where accesses to shadow_vcpu fields are now replaced by
      accesses to the kvm_vcpu, we can also remove the svcpu_get/put pairs.
      Finally, on 64-bit, we don't need the kmalloc'd struct at all any more.
      
      With this, the time to read the PVR one million times in a loop went
      from 567.7ms to 575.5ms (averages of 6 values), an increase of about
      1.4% for this worse-case test for guest entries and exits.  The
      standard deviation of the measurements is about 11ms, so the
      difference is only marginally significant statistically.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a2d56020
    • P
      KVM: PPC: Book3S HV: Support POWER6 compatibility mode on POWER7 · 388cc6e1
      Paul Mackerras 提交于
      This enables us to use the Processor Compatibility Register (PCR) on
      POWER7 to put the processor into architecture 2.05 compatibility mode
      when running a guest.  In this mode the new instructions and registers
      that were introduced on POWER7 are disabled in user mode.  This
      includes all the VSX facilities plus several other instructions such
      as ldbrx, stdbrx, popcntw, popcntd, etc.
      
      To select this mode, we have a new register accessible through the
      set/get_one_reg interface, called KVM_REG_PPC_ARCH_COMPAT.  Setting
      this to zero gives the full set of capabilities of the processor.
      Setting it to one of the "logical" PVR values defined in PAPR puts
      the vcpu into the compatibility mode for the corresponding
      architecture level.  The supported values are:
      
      0x0f000002	Architecture 2.05 (POWER6)
      0x0f000003	Architecture 2.06 (POWER7)
      0x0f100003	Architecture 2.06+ (POWER7+)
      
      Since the PCR is per-core, the architecture compatibility level and
      the corresponding PCR value are stored in the struct kvmppc_vcore, and
      are therefore shared between all vcpus in a virtual core.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: squash in fix to add missing break statements and documentation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      388cc6e1
    • P
      KVM: PPC: Book3S HV: Add support for guest Program Priority Register · 4b8473c9
      Paul Mackerras 提交于
      POWER7 and later IBM server processors have a register called the
      Program Priority Register (PPR), which controls the priority of
      each hardware CPU SMT thread, and affects how fast it runs compared
      to other SMT threads.  This priority can be controlled by writing to
      the PPR or by use of a set of instructions of the form or rN,rN,rN
      which are otherwise no-ops but have been defined to set the priority
      to particular levels.
      
      This adds code to context switch the PPR when entering and exiting
      guests and to make the PPR value accessible through the SET/GET_ONE_REG
      interface.  When entering the guest, we set the PPR as late as
      possible, because if we are setting a low thread priority it will
      make the code run slowly from that point on.  Similarly, the
      first-level interrupt handlers save the PPR value in the PACA very
      early on, and set the thread priority to the medium level, so that
      the interrupt handling code runs at a reasonable speed.
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4b8473c9
    • P
      KVM: PPC: Book3S HV: Store LPCR value for each virtual core · a0144e2a
      Paul Mackerras 提交于
      This adds the ability to have a separate LPCR (Logical Partitioning
      Control Register) value relating to a guest for each virtual core,
      rather than only having a single value for the whole VM.  This
      corresponds to what real POWER hardware does, where there is a LPCR
      per CPU thread but most of the fields are required to have the same
      value on all active threads in a core.
      
      The per-virtual-core LPCR can be read and written using the
      GET/SET_ONE_REG interface.  Userspace can can only modify the
      following fields of the LPCR value:
      
      DPFD	Default prefetch depth
      ILE	Interrupt little-endian
      TC	Translation control (secondary HPT hash group search disable)
      
      We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which
      contains bits relating to memory management, i.e. the Virtualized
      Partition Memory (VPM) bits and the bits relating to guest real mode.
      When this default value is updated, the update needs to be propagated
      to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do
      that.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix whitespace]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a0144e2a
    • P
      KVM: PPC: Book3S: Add GET/SET_ONE_REG interface for VRSAVE · c0867fd5
      Paul Mackerras 提交于
      The VRSAVE register value for a vcpu is accessible through the
      GET/SET_SREGS interface for Book E processors, but not for Book 3S
      processors.  In order to make this accessible for Book 3S processors,
      this adds a new register identifier for GET/SET_ONE_REG, and adds
      the code to implement it.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c0867fd5
    • P
      KVM: PPC: Book3S HV: Implement timebase offset for guests · 93b0f4dc
      Paul Mackerras 提交于
      This allows guests to have a different timebase origin from the host.
      This is needed for migration, where a guest can migrate from one host
      to another and the two hosts might have a different timebase origin.
      However, the timebase seen by the guest must not go backwards, and
      should go forwards only by a small amount corresponding to the time
      taken for the migration.
      
      Therefore this provides a new per-vcpu value accessed via the one_reg
      interface using the new KVM_REG_PPC_TB_OFFSET identifier.  This value
      defaults to 0 and is not modified by KVM.  On entering the guest, this
      value is added onto the timebase, and on exiting the guest, it is
      subtracted from the timebase.
      
      This is only supported for recent POWER hardware which has the TBU40
      (timebase upper 40 bits) register.  Writing to the TBU40 register only
      alters the upper 40 bits of the timebase, leaving the lower 24 bits
      unchanged.  This provides a way to modify the timebase for guest
      migration without disturbing the synchronization of the timebase
      registers across CPU cores.  The kernel rounds up the value given
      to a multiple of 2^24.
      
      Timebase values stored in KVM structures (struct kvm_vcpu, struct
      kvmppc_vcore, etc.) are stored as host timebase values.  The timebase
      values in the dispatch trace log need to be guest timebase values,
      however, since that is read directly by the guest.  This moves the
      setting of vcpu->arch.dec_expires on guest exit to a point after we
      have restored the host timebase so that vcpu->arch.dec_expires is a
      host timebase value.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      93b0f4dc
    • P
      KVM: PPC: Book3S HV: Save/restore SIAR and SDAR along with other PMU registers · 14941789
      Paul Mackerras 提交于
      Currently we are not saving and restoring the SIAR and SDAR registers in
      the PMU (performance monitor unit) on guest entry and exit.  The result
      is that performance monitoring tools in the guest could get false
      information about where a program was executing and what data it was
      accessing at the time of a performance monitor interrupt.  This fixes
      it by saving and restoring these registers along with the other PMU
      registers on guest entry/exit.
      
      This also provides a way for userspace to access these values for a
      vcpu via the one_reg interface.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      14941789
    • M
      KVM: PPC: Book3S HV: Reserve POWER8 space in get/set_one_reg · 3b783474
      Michael Neuling 提交于
      This reserves space in get/set_one_reg ioctl for the extra guest state
      needed for POWER8.  It doesn't implement these at all, it just reserves
      them so that the ABI is defined now.
      
      A few things to note here:
      
      - This add *a lot* state for transactional memory.  TM suspend mode,
        this is unavoidable, you can't simply roll back all transactions and
        store only the checkpointed state.  I've added this all to
        get/set_one_reg (including GPRs) rather than creating a new ioctl
        which returns a struct kvm_regs like KVM_GET_REGS does.  This means we
        if we need to extract the TM state, we are going to need a bucket load
        of IOCTLs.  Hopefully most of the time this will not be needed as we
        can look at the MSR to see if TM is active and only grab them when
        needed.  If this becomes a bottle neck in future we can add another
        ioctl to grab all this state in one go.
      
      - The TM state is offset by 0x80000000.
      
      - For TM, I've done away with VMX and FP and created a single 64x128 bit
        VSX register space.
      
      - I've left a space of 1 (at 0x9c) since Paulus needs to add a value
        which applies to POWER7 as well.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3b783474
  3. 14 10月, 2013 1 次提交
  4. 05 9月, 2013 1 次提交
  5. 28 8月, 2013 1 次提交
    • P
      KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls · 8b23de29
      Paul Mackerras 提交于
      It turns out that if we exit the guest due to a hcall instruction (sc 1),
      and the loading of the instruction in the guest exit path fails for any
      reason, the call to kvmppc_ld() in kvmppc_get_last_inst() fetches the
      instruction after the hcall instruction rather than the hcall itself.
      This in turn means that the instruction doesn't get recognized as an
      hcall in kvmppc_handle_exit_pr() but gets passed to the guest kernel
      as a sc instruction.  That usually results in the guest kernel getting
      a return code of 38 (ENOSYS) from an hcall, which often triggers a
      BUG_ON() or other failure.
      
      This fixes the problem by adding a new variant of kvmppc_get_last_inst()
      called kvmppc_get_last_sc(), which fetches the instruction if necessary
      from pc - 4 rather than pc.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8b23de29
  6. 27 8月, 2013 3 次提交
  7. 24 8月, 2013 2 次提交
  8. 21 8月, 2013 3 次提交
    • S
      of: move of_get_cpu_node implementation to DT core library · 183912d3
      Sudeep KarkadaNagesha 提交于
      This patch moves the generalized implementation of of_get_cpu_node from
      PowerPC to DT core library, thereby adding support for retrieving cpu
      node for a given logical cpu index on any architecture.
      
      The CPU subsystem can now use this function to assign of_node in the
      cpu device while registering CPUs.
      
      It is recommended to use these helper function only in pre-SMP/early
      initialisation stages to retrieve CPU device node pointers in logical
      ordering. Once the cpu devices are registered, it can be retrieved easily
      from cpu device of_node which avoids unnecessary parsing and matching.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Acked-by: NRob Herring <rob.herring@calxeda.com>
      Signed-off-by: NSudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
      183912d3
    • S
      powerpc: Convert some mftb/mftbu into mfspr · beb2dc0a
      Scott Wood 提交于
      Some CPUs (such as e500v1/v2) don't implement mftb and will take a
      trap.  mfspr should work on everything that has a timebase, and is the
      preferred instruction according to ISA v2.06.
      
      Currently we get away with mftb on 85xx because the assembler converts
      it to mfspr due to -Wa,-me500.  However, that flag has other effects
      that are undesireable for certain targets (e.g.  lwsync is converted to
      sync), and is hostile to multiplatform kernels.  Thus we would like to
      stop setting it for all e500-family builds.
      
      mftb/mftbu instances which are in 85xx code or common code are
      converted.  Instances which will never run on 85xx are left alone.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      beb2dc0a
    • S
      powerpc/fsl-booke: Work around erratum A-006958 · d52459ca
      Scott Wood 提交于
      Erratum A-006598 says that 64-bit mftb is not atomic -- it's subject
      to a similar race condition as doing mftbu/mftbl on 32-bit.  The lower
      half of timebase is updated before the upper half; thus, we can share
      the workaround for a similar bug on Cell.  This workaround involves
      looping if the lower half of timebase is zero, thus avoiding the need
      for a scratch register (other than CR0).  This workaround must be
      avoided when the timebase is frozen, such as during the timebase sync
      code.
      
      This deals with kernel and vdso accesses, but other userspace accesses
      will of course need to be fixed elsewhere.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      d52459ca
  9. 14 8月, 2013 1 次提交
    • F
      vtime: Describe overriden functions in dedicated arch headers · a5725ac2
      Frederic Weisbecker 提交于
      If the arch overrides some generic vtime APIs, let it describe
      these on a dedicated and standalone header. This way it becomes
      convenient to include it in vtime generic headers without irrelevant
      stuff in such a low level header.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      a5725ac2