1. 17 10月, 2013 9 次提交
    • P
      KVM: PPC: Book3S HV: Store LPCR value for each virtual core · a0144e2a
      Paul Mackerras 提交于
      This adds the ability to have a separate LPCR (Logical Partitioning
      Control Register) value relating to a guest for each virtual core,
      rather than only having a single value for the whole VM.  This
      corresponds to what real POWER hardware does, where there is a LPCR
      per CPU thread but most of the fields are required to have the same
      value on all active threads in a core.
      
      The per-virtual-core LPCR can be read and written using the
      GET/SET_ONE_REG interface.  Userspace can can only modify the
      following fields of the LPCR value:
      
      DPFD	Default prefetch depth
      ILE	Interrupt little-endian
      TC	Translation control (secondary HPT hash group search disable)
      
      We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which
      contains bits relating to memory management, i.e. the Virtualized
      Partition Memory (VPM) bits and the bits relating to guest real mode.
      When this default value is updated, the update needs to be propagated
      to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do
      that.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix whitespace]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a0144e2a
    • P
      KVM: PPC: BookE: Add GET/SET_ONE_REG interface for VRSAVE · 8b75cbbe
      Paul Mackerras 提交于
      This makes the VRSAVE register value for a vcpu accessible through
      the GET/SET_ONE_REG interface on Book E systems (in addition to the
      existing GET/SET_SREGS interface), for consistency with Book 3S.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8b75cbbe
    • P
      KVM: PPC: Book3S HV: Avoid unbalanced increments of VPA yield count · 8c2dbb79
      Paul Mackerras 提交于
      The yield count in the VPA is supposed to be incremented every time
      we enter the guest, and every time we exit the guest, so that its
      value is even when the vcpu is running in the guest and odd when it
      isn't.  However, it's currently possible that we increment the yield
      count on the way into the guest but then find that other CPU threads
      are already exiting the guest, so we go back to nap mode via the
      secondary_too_late label.  In this situation we don't increment the
      yield count again, breaking the relationship between the LSB of the
      count and whether the vcpu is in the guest.
      
      To fix this, we move the increment of the yield count to a point
      after we have checked whether other CPU threads are exiting.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8c2dbb79
    • P
      KVM: PPC: Book3S HV: Pull out interrupt-reading code into a subroutine · c934243c
      Paul Mackerras 提交于
      This moves the code in book3s_hv_rmhandlers.S that reads any pending
      interrupt from the XICS interrupt controller, and works out whether
      it is an IPI for the guest, an IPI for the host, or a device interrupt,
      into a new function called kvmppc_read_intr.  Later patches will
      need this.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c934243c
    • P
      KVM: PPC: Book3S HV: Restructure kvmppc_hv_entry to be a subroutine · 218309b7
      Paul Mackerras 提交于
      We have two paths into and out of the low-level guest entry and exit
      code: from a vcpu task via kvmppc_hv_entry_trampoline, and from the
      system reset vector for an offline secondary thread on POWER7 via
      kvm_start_guest.  Currently both just branch to kvmppc_hv_entry to
      enter the guest, and on guest exit, we test the vcpu physical thread
      ID to detect which way we came in and thus whether we should return
      to the vcpu task or go back to nap mode.
      
      In order to make the code flow clearer, and to keep the code relating
      to each flow together, this turns kvmppc_hv_entry into a subroutine
      that follows the normal conventions for call and return.  This means
      that kvmppc_hv_entry_trampoline() and kvmppc_hv_entry() now establish
      normal stack frames, and we use the normal stack slots for saving
      return addresses rather than local_paca->kvm_hstate.vmhandler.  Apart
      from that this is mostly moving code around unchanged.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      218309b7
    • P
      KVM: PPC: Book3S HV: Implement H_CONFER · 42d7604d
      Paul Mackerras 提交于
      The H_CONFER hypercall is used when a guest vcpu is spinning on a lock
      held by another vcpu which has been preempted, and the spinning vcpu
      wishes to give its timeslice to the lock holder.  We implement this
      in the straightforward way using kvm_vcpu_yield_to().
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      42d7604d
    • P
      KVM: PPC: Book3S: Add GET/SET_ONE_REG interface for VRSAVE · c0867fd5
      Paul Mackerras 提交于
      The VRSAVE register value for a vcpu is accessible through the
      GET/SET_SREGS interface for Book E processors, but not for Book 3S
      processors.  In order to make this accessible for Book 3S processors,
      this adds a new register identifier for GET/SET_ONE_REG, and adds
      the code to implement it.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c0867fd5
    • P
      KVM: PPC: Book3S HV: Implement timebase offset for guests · 93b0f4dc
      Paul Mackerras 提交于
      This allows guests to have a different timebase origin from the host.
      This is needed for migration, where a guest can migrate from one host
      to another and the two hosts might have a different timebase origin.
      However, the timebase seen by the guest must not go backwards, and
      should go forwards only by a small amount corresponding to the time
      taken for the migration.
      
      Therefore this provides a new per-vcpu value accessed via the one_reg
      interface using the new KVM_REG_PPC_TB_OFFSET identifier.  This value
      defaults to 0 and is not modified by KVM.  On entering the guest, this
      value is added onto the timebase, and on exiting the guest, it is
      subtracted from the timebase.
      
      This is only supported for recent POWER hardware which has the TBU40
      (timebase upper 40 bits) register.  Writing to the TBU40 register only
      alters the upper 40 bits of the timebase, leaving the lower 24 bits
      unchanged.  This provides a way to modify the timebase for guest
      migration without disturbing the synchronization of the timebase
      registers across CPU cores.  The kernel rounds up the value given
      to a multiple of 2^24.
      
      Timebase values stored in KVM structures (struct kvm_vcpu, struct
      kvmppc_vcore, etc.) are stored as host timebase values.  The timebase
      values in the dispatch trace log need to be guest timebase values,
      however, since that is read directly by the guest.  This moves the
      setting of vcpu->arch.dec_expires on guest exit to a point after we
      have restored the host timebase so that vcpu->arch.dec_expires is a
      host timebase value.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      93b0f4dc
    • P
      KVM: PPC: Book3S HV: Save/restore SIAR and SDAR along with other PMU registers · 14941789
      Paul Mackerras 提交于
      Currently we are not saving and restoring the SIAR and SDAR registers in
      the PMU (performance monitor unit) on guest entry and exit.  The result
      is that performance monitoring tools in the guest could get false
      information about where a program was executing and what data it was
      accessing at the time of a performance monitor interrupt.  This fixes
      it by saving and restoring these registers along with the other PMU
      registers on guest entry/exit.
      
      This also provides a way for userspace to access these values for a
      vcpu via the one_reg interface.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      14941789
  2. 04 9月, 2013 1 次提交
  3. 29 8月, 2013 1 次提交
  4. 28 8月, 2013 5 次提交
    • P
      KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls · 8b23de29
      Paul Mackerras 提交于
      It turns out that if we exit the guest due to a hcall instruction (sc 1),
      and the loading of the instruction in the guest exit path fails for any
      reason, the call to kvmppc_ld() in kvmppc_get_last_inst() fetches the
      instruction after the hcall instruction rather than the hcall itself.
      This in turn means that the instruction doesn't get recognized as an
      hcall in kvmppc_handle_exit_pr() but gets passed to the guest kernel
      as a sc instruction.  That usually results in the guest kernel getting
      a return code of 38 (ENOSYS) from an hcall, which often triggers a
      BUG_ON() or other failure.
      
      This fixes the problem by adding a new variant of kvmppc_get_last_inst()
      called kvmppc_get_last_sc(), which fetches the instruction if necessary
      from pc - 4 rather than pc.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8b23de29
    • P
      KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX · 9d1ffdd8
      Paul Mackerras 提交于
      Currently the code assumes that once we load up guest FP/VSX or VMX
      state into the CPU, it stays valid in the CPU registers until we
      explicitly flush it to the thread_struct.  However, on POWER7,
      copy_page() and memcpy() can use VMX.  These functions do flush the
      VMX state to the thread_struct before using VMX instructions, but if
      this happens while we have guest state in the VMX registers, and we
      then re-enter the guest, we don't reload the VMX state from the
      thread_struct, leading to guest corruption.  This has been observed
      to cause guest processes to segfault.
      
      To fix this, we check before re-entering the guest that all of the
      bits corresponding to facilities owned by the guest, as expressed
      in vcpu->arch.guest_owned_ext, are set in current->thread.regs->msr.
      Any bits that have been cleared correspond to facilities that have
      been used by kernel code and thus flushed to the thread_struct, so
      for them we reload the state from the thread_struct.
      
      We also need to check current->thread.regs->msr before calling
      giveup_fpu() or giveup_altivec(), since if the relevant bit is
      clear, the state has already been flushed to the thread_struct and
      to flush it again would corrupt it.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9d1ffdd8
    • P
      KVM: PPC: Book3S: Fix compile error in XICS emulation · 7bfa9ad5
      Paul Mackerras 提交于
      Commit 8e44ddc3 ("powerpc/kvm/book3s: Add support for H_IPOLL and
      H_XIRR_X in XICS emulation") added a call to get_tb() but didn't
      include the header that defines it, and on some configs this means
      book3s_xics.c fails to compile:
      
      arch/powerpc/kvm/book3s_xics.c: In function ‘kvmppc_xics_hcall’:
      arch/powerpc/kvm/book3s_xics.c:812:3: error: implicit declaration of function ‘get_tb’ [-Werror=implicit-function-declaration]
      
      Cc: stable@vger.kernel.org [v3.10, v3.11]
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7bfa9ad5
    • T
      KVM: PPC: Book3S PR: return appropriate error when allocation fails · 7c7b406e
      Thadeu Lima de Souza Cascardo 提交于
      err was overwritten by a previous function call, and checked to be 0. If
      the following page allocation fails, 0 is going to be returned instead
      of -ENOMEM.
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7c7b406e
    • C
      arch: powerpc: kvm: add signed type cast for comparation · 5d226ae5
      Chen Gang 提交于
      'rmls' is 'unsigned long', lpcr_rmls() will return negative number when
      failure occurs, so it need a type cast for comparing.
      
      'lpid' is 'unsigned long', kvmppc_alloc_lpid() return negative number
      when failure occurs, so it need a type cast for comparing.
      Signed-off-by: NChen Gang <gang.chen@asianux.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5d226ae5
  5. 26 8月, 2013 1 次提交
  6. 23 8月, 2013 1 次提交
  7. 14 8月, 2013 3 次提交
  8. 09 8月, 2013 2 次提交
  9. 31 7月, 2013 1 次提交
  10. 25 7月, 2013 1 次提交
  11. 18 7月, 2013 1 次提交
  12. 11 7月, 2013 2 次提交
  13. 10 7月, 2013 2 次提交
    • P
      KVM: PPC: Book3S HV: Allow negative offsets to real-mode hcall handlers · 4baa1d87
      Paul Mackerras 提交于
      The table of offsets to real-mode hcall handlers in book3s_hv_rmhandlers.S
      can contain negative values, if some of the handlers end up before the
      table in the vmlinux binary.  Thus we need to use a sign-extending load
      to read the values in the table rather than a zero-extending load.
      Without this, the host crashes when the guest does one of the hcalls
      with negative offsets, due to jumping to a bogus address.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4baa1d87
    • P
      KVM: PPC: Book3S HV: Correct tlbie usage · 54480501
      Paul Mackerras 提交于
      This corrects the usage of the tlbie (TLB invalidate entry) instruction
      in HV KVM.  The tlbie instruction changed between PPC970 and POWER7.
      On the PPC970, the bit to select large vs. small page is in the instruction,
      not in the RB register value.  This changes the code to use the correct
      form on PPC970.
      
      On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower)
      field of the RB value incorrectly for 64k pages.  This fixes it.
      
      Since we now have several cases to handle for the tlbie instruction, this
      factors out the code to do a sequence of tlbies into a new function,
      do_tlbies(), and calls that from the various places where the code was
      doing tlbie instructions inline.  It also makes kvmppc_h_bulk_remove()
      use the same global_invalidates() function for determining whether to do
      local or global TLB invalidations as is used in other places, for
      consistency, and also to make sure that kvm->arch.need_tlb_flush gets
      updated properly.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      54480501
  14. 08 7月, 2013 4 次提交
  15. 30 6月, 2013 6 次提交
    • A
      KVM: PPC: Ignore PIR writes · a3ff5fbc
      Alexander Graf 提交于
      While technically it's legal to write to PIR and have the identifier changed,
      we don't implement logic to do so because we simply expose vcpu_id to the guest.
      
      So instead, let's ignore writes to PIR. This ensures that we don't inject faults
      into the guest for something the guest is allowed to do. While at it, we cross
      our fingers hoping that it also doesn't mind that we broke its PIR read values.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a3ff5fbc
    • P
      KVM: PPC: Book3S PR: Invalidate SLB entries properly · 681562cd
      Paul Mackerras 提交于
      At present, if the guest creates a valid SLB (segment lookaside buffer)
      entry with the slbmte instruction, then invalidates it with the slbie
      instruction, then reads the entry with the slbmfee/slbmfev instructions,
      the result of the slbmfee will have the valid bit set, even though the
      entry is not actually considered valid by the host.  This is confusing,
      if not worse.  This fixes it by zeroing out the orige and origv fields
      of the SLB entry structure when the entry is invalidated.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      681562cd
    • P
      KVM: PPC: Book3S PR: Allow guest to use 1TB segments · 0f296829
      Paul Mackerras 提交于
      With this, the guest can use 1TB segments as well as 256MB segments.
      Since we now have the situation where a single emulated guest segment
      could correspond to multiple shadow segments (as the shadow segments
      are still 256MB segments), this adds a new kvmppc_mmu_flush_segment()
      to scan for all shadow segments that need to be removed.
      
      This restructures the guest HPT (hashed page table) lookup code to
      use the correct hashing and matching functions for HPTEs within a
      1TB segment.  We use the standard hpt_hash() function instead of
      open-coding the hash calculation, and we use HPTE_V_COMPARE() with
      an AVPN value that has the B (segment size) field included.  The
      calculation of avpn is done a little earlier since it doesn't change
      in the loop starting at the do_second label.
      
      The computation in kvmppc_mmu_book3s_64_esid_to_vsid() changes so that
      it returns a 256MB VSID even if the guest SLB entry is a 1TB entry.
      This is because the users of this function are creating 256MB SLB
      entries.  We set a new VSID_1T flag so that entries created from 1T
      segments don't collide with entries from 256MB segments.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      0f296829
    • P
      KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match · 6ed1485f
      Paul Mackerras 提交于
      The loop in kvmppc_mmu_book3s_64_xlate() that looks up a translation
      in the guest hashed page table (HPT) keeps going if it finds an
      HPTE that matches but doesn't allow access.  This is incorrect; it
      is different from what the hardware does, and there should never be
      more than one matching HPTE anyway.  This fixes it to stop when any
      matching HPTE is found.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      6ed1485f
    • P
      KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry · bc1bc4e3
      Paul Mackerras 提交于
      On entering a PR KVM guest, we invalidate the whole SLB before loading
      up the guest entries.  We do this using an slbia instruction, which
      invalidates all entries except entry 0, followed by an slbie to
      invalidate entry 0.  However, the slbie turns out to be ineffective
      in some circumstances (specifically when the host linear mapping uses
      64k pages) because of errors in computing the parameter to the slbie.
      The result is that the guest kernel hangs very early in boot because
      it takes a DSI the first time it tries to access kernel data using
      a linear mapping address in real mode.
      
      Currently we construct bits 36 - 43 (big-endian numbering) of the slbie
      parameter by taking bits 56 - 63 of the SLB VSID doubleword.  These bits
      for the tlbie are C (class, 1 bit), B (segment size, 2 bits) and 5
      reserved bits.  For the SLB VSID doubleword these are C (class, 1 bit),
      reserved (1 bit), LP (large page size, 2 bits), and 4 reserved bits.
      Thus we are not setting the B field correctly, and when LP = 01 as
      it is for 64k pages, we are setting a reserved bit.
      
      Rather than add more instructions to calculate the slbie parameter
      correctly, this takes a simpler approach, which is to set entry 0 to
      zeroes explicitly.  Normally slbmte should not be used to invalidate
      an entry, since it doesn't invalidate the ERATs, but it is OK to use
      it to invalidate an entry if it is immediately followed by slbia,
      which does invalidate the ERATs.  (This has been confirmed with the
      Power architects.)  This approach takes fewer instructions and will
      work whatever the contents of entry 0.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      bc1bc4e3
    • P
      KVM: PPC: Book3S PR: Fix proto-VSID calculations · 8ed7b7e9
      Paul Mackerras 提交于
      This makes sure the calculation of the proto-VSIDs used by PR KVM
      is done with 64-bit arithmetic.  Since vcpu3s->context_id[] is int,
      when we do vcpu3s->context_id[0] << ESID_BITS the shift will be done
      with 32-bit instructions, possibly leading to significant bits
      getting lost, as the context id can be up to 524283 and ESID_BITS is
      18.  To fix this we cast the context id to u64 before shifting.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8ed7b7e9