1. 30 5月, 2014 19 次提交
    • A
      KVM: PPC: Book3S PR: Expose TAR facility to guest · e14e7a1e
      Alexander Graf 提交于
      POWER8 implements a new register called TAR. This register has to be
      enabled in FSCR and then from KVM's point of view is mere storage.
      
      This patch enables the guest to use TAR.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e14e7a1e
    • A
      KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR · 616dff86
      Alexander Graf 提交于
      POWER8 introduced a new interrupt type called "Facility unavailable interrupt"
      which contains its status message in a new register called FSCR.
      
      Handle these exits and try to emulate instructions for unhandled facilities.
      Follow-on patches enable KVM to expose specific facilities into the guest.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      616dff86
    • A
      KVM: PPC: Book3S PR: Emulate TIR register · a5948fa0
      Alexander Graf 提交于
      In parallel to the Processor ID Register (PIR) threaded POWER8 also adds a
      Thread ID Register (TIR). Since PR KVM doesn't emulate more than one thread
      per core, we can just always expose 0 here.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a5948fa0
    • A
      KVM: PPC: Book3S PR: Ignore PMU SPRs · f8f6eb0d
      Alexander Graf 提交于
      When we expose a POWER8 CPU into the guest, it will start accessing PMU SPRs
      that we don't emulate. Just ignore accesses to them.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      f8f6eb0d
    • A
      KVM: PPC: Book3S: Move little endian conflict to HV KVM · f24bc1ed
      Alexander Graf 提交于
      With the previous patches applied, we can now successfully use PR KVM on
      little endian hosts which means we can now allow users to select it.
      
      However, HV KVM still needs some work, so let's keep the kconfig conflict
      on that one.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      f24bc1ed
    • A
      KVM: PPC: Book3S PR: Do dcbz32 patching with big endian instructions · cd087eef
      Alexander Graf 提交于
      When the host CPU we're running on doesn't support dcbz32 itself, but the
      guest wants to have dcbz only clear 32 bytes of data, we loop through every
      executable mapped page to search for dcbz instructions and patch them with
      a special privileged instruction that we emulate as dcbz32.
      
      The only guests that want to see dcbz act as 32byte are book3s_32 guests, so
      we don't have to worry about little endian instruction ordering. So let's
      just always search for big endian dcbz instructions, also when we're on a
      little endian host.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      cd087eef
    • A
      KVM: PPC: Make shared struct aka magic page guest endian · 5deb8e7a
      Alexander Graf 提交于
      The shared (magic) page is a data structure that contains often used
      supervisor privileged SPRs accessible via memory to the user to reduce
      the number of exits we have to take to read/write them.
      
      When we actually share this structure with the guest we have to maintain
      it in guest endianness, because some of the patch tricks only work with
      native endian load/store operations.
      
      Since we only share the structure with either host or guest in little
      endian on book3s_64 pr mode, we don't have to worry about booke or book3s hv.
      
      For booke, the shared struct stays big endian. For book3s_64 hv we maintain
      the struct in host native endian, since it never gets shared with the guest.
      
      For book3s_64 pr we introduce a variable that tells us which endianness the
      shared struct is in and route every access to it through helper inline
      functions that evaluate this variable.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5deb8e7a
    • A
      KVM: PPC: PR: Fill pvinfo hcall instructions in big endian · 2743103f
      Alexander Graf 提交于
      We expose a blob of hypercall instructions to user space that it gives to
      the guest via device tree again. That blob should contain a stream of
      instructions necessary to do a hypercall in big endian, as it just gets
      passed into the guest and old guests use them straight away.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      2743103f
    • A
      KVM: PPC: Book3S PR: PAPR: Access RTAS in big endian · b59d9d26
      Alexander Graf 提交于
      When the guest does an RTAS hypercall it keeps all RTAS variables inside a
      big endian data structure.
      
      To make sure we don't have to bother about endianness inside the actual RTAS
      handlers, let's just convert the whole structure to host endian before we
      call our RTAS handlers and back to big endian when we return to the guest.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b59d9d26
    • A
      KVM: PPC: Book3S PR: PAPR: Access HTAB in big endian · 1692aa3f
      Alexander Graf 提交于
      The HTAB on PPC is always in big endian. When we access it via hypercalls
      on behalf of the guest and we're running on a little endian host, we need
      to make sure we swap the bits accordingly.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1692aa3f
    • A
      KVM: PPC: Book3S PR: Default to big endian guest · 94810ba4
      Alexander Graf 提交于
      The default MSR when user space does not define anything should be identical
      on little and big endian hosts, so remove MSR_LE from it.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      94810ba4
    • A
      KVM: PPC: Book3S_64 PR: Access shadow slb in big endian · 14a7d41d
      Alexander Graf 提交于
      The "shadow SLB" in the PACA is shared with the hypervisor, so it has to
      be big endian. We access the shadow SLB during world switch, so let's make
      sure we access it in big endian even when we're on a little endian host.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      14a7d41d
    • A
      KVM: PPC: Book3S_64 PR: Access HTAB in big endian · 4e509af9
      Alexander Graf 提交于
      The HTAB is always big endian. We access the guest's HTAB using
      copy_from/to_user, but don't yet take care of the fact that we might
      be running on an LE host.
      
      Wrap all accesses to the guest HTAB with big endian accessors.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4e509af9
    • A
      KVM: PPC: Book3S_32: PR: Access HTAB in big endian · 860540bc
      Alexander Graf 提交于
      The HTAB is always big endian. We access the guest's HTAB using
      copy_from/to_user, but don't yet take care of the fact that we might
      be running on an LE host.
      
      Wrap all accesses to the guest HTAB with big endian accessors.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      860540bc
    • A
      KVM: PPC: Book3S: PR: Fix C/R bit setting · 740f834e
      Alexander Graf 提交于
      Commit 9308ab8e made C/R HTAB updates go byte-wise into the target HTAB.
      However, it didn't update the guest's copy of the HTAB, but instead the
      host local copy of it.
      
      Write to the guest's HTAB instead.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      CC: Paul Mackerras <paulus@samba.org>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      740f834e
    • A
      KVM: PPC: BOOK3S: PR: Fix WARN_ON with debug options on · 7562c4fd
      Aneesh Kumar K.V 提交于
      With debug option "sleep inside atomic section checking" enabled we get
      the below WARN_ON during a PR KVM boot. This is because upstream now
      have PREEMPT_COUNT enabled even if we have preempt disabled. Fix the
      warning by adding preempt_disable/enable around floating point and altivec
      enable.
      
      WARNING: at arch/powerpc/kernel/process.c:156
      Modules linked in: kvm_pr kvm
      CPU: 1 PID: 3990 Comm: qemu-system-ppc Tainted: G        W     3.15.0-rc1+ #4
      task: c0000000eb85b3a0 ti: c0000000ec59c000 task.ti: c0000000ec59c000
      NIP: c000000000015c84 LR: d000000003334644 CTR: c000000000015c00
      REGS: c0000000ec59f140 TRAP: 0700   Tainted: G        W      (3.15.0-rc1+)
      MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 42000024  XER: 20000000
      CFAR: c000000000015c24 SOFTE: 1
      GPR00: d000000003334644 c0000000ec59f3c0 c000000000e2fa40 c0000000e2f80000
      GPR04: 0000000000000800 0000000000002000 0000000000000001 8000000000000000
      GPR08: 0000000000000001 0000000000000001 0000000000002000 c000000000015c00
      GPR12: d00000000333da18 c00000000fb80900 0000000000000000 0000000000000000
      GPR16: 0000000000000000 0000000000000000 0000000000000000 00003fffce4e0fa1
      GPR20: 0000000000000010 0000000000000001 0000000000000002 00000000100b9a38
      GPR24: 0000000000000002 0000000000000000 0000000000000000 0000000000000013
      GPR28: 0000000000000000 c0000000eb85b3a0 0000000000002000 c0000000e2f80000
      NIP [c000000000015c84] .enable_kernel_fp+0x84/0x90
      LR [d000000003334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr]
      Call Trace:
      [c0000000ec59f3c0] [0000000000000010] 0x10 (unreliable)
      [c0000000ec59f430] [d000000003334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr]
      [c0000000ec59f4c0] [d00000000324b380] .kvmppc_set_msr+0x30/0x50 [kvm]
      [c0000000ec59f530] [d000000003337cac] .kvmppc_core_emulate_op_pr+0x16c/0x5e0 [kvm_pr]
      [c0000000ec59f5f0] [d00000000324a944] .kvmppc_emulate_instruction+0x284/0xa80 [kvm]
      [c0000000ec59f6c0] [d000000003336888] .kvmppc_handle_exit_pr+0x488/0xb70 [kvm_pr]
      [c0000000ec59f790] [d000000003338d34] kvm_start_lightweight+0xcc/0xdc [kvm_pr]
      [c0000000ec59f960] [d000000003336288] .kvmppc_vcpu_run_pr+0xc8/0x190 [kvm_pr]
      [c0000000ec59f9f0] [d00000000324c880] .kvmppc_vcpu_run+0x30/0x50 [kvm]
      [c0000000ec59fa60] [d000000003249e74] .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [kvm]
      [c0000000ec59faf0] [d000000003244948] .kvm_vcpu_ioctl+0x478/0x760 [kvm]
      [c0000000ec59fcb0] [c000000000224e34] .do_vfs_ioctl+0x4d4/0x790
      [c0000000ec59fd90] [c000000000225148] .SyS_ioctl+0x58/0xb0
      [c0000000ec59fe30] [c00000000000a1e4] syscall_exit+0x0/0x98
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7562c4fd
    • A
      KVM: PPC: BOOK3S: PR: Enable Little Endian PR guest · e5ee5422
      Aneesh Kumar K.V 提交于
      This patch make sure we inherit the LE bit correctly in different case
      so that we can run Little Endian distro in PR mode
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e5ee5422
    • A
      KVM: PPC: E500: Add dcbtls emulation · 8f20a3ab
      Alexander Graf 提交于
      The dcbtls instruction is able to lock data inside the L1 cache.
      
      We don't want to give the guest actual access to hardware cache locks,
      as that could influence other VMs on the same system. But we can tell
      the guest that its locking attempt failed.
      
      By implementing the instruction we at least don't give the guest a
      program exception which it definitely does not expect.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8f20a3ab
    • A
      KVM: PPC: E500: Ignore L1CSR1_ICFI,ICLFR · 07fec1c2
      Alexander Graf 提交于
      The L1 instruction cache control register contains bits that indicate
      that we're still handling a request. Mask those out when we set the SPR
      so that a read doesn't assume we're still doing something.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      07fec1c2
  2. 22 5月, 2014 6 次提交
    • N
      KVM: vmx: DR7 masking on task switch emulation is wrong · 1f854112
      Nadav Amit 提交于
      The DR7 masking which is done on task switch emulation should be in hex format
      (clearing the local breakpoints enable bits 0,2,4 and 6).
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1f854112
    • D
      x86: fix page fault tracing when KVM guest support enabled · 65a7f03f
      Dave Hansen 提交于
      I noticed on some of my systems that page fault tracing doesn't
      work:
      
      	cd /sys/kernel/debug/tracing
      	echo 1 > events/exceptions/enable
      	cat trace;
      	# nothing shows up
      
      I eventually traced it down to CONFIG_KVM_GUEST.  At least in a
      KVM VM, enabling that option breaks page fault tracing, and
      disabling fixes it.  I tried on some old kernels and this does
      not appear to be a regression: it never worked.
      
      There are two page-fault entry functions today.  One when tracing
      is on and another when it is off.  The KVM code calls do_page_fault()
      directly instead of calling the traced version:
      
      > dotraplinkage void __kprobes
      > do_async_page_fault(struct pt_regs *regs, unsigned long
      > error_code)
      > {
      >         enum ctx_state prev_state;
      >
      >         switch (kvm_read_and_reset_pf_reason()) {
      >         default:
      >                 do_page_fault(regs, error_code);
      >                 break;
      >         case KVM_PV_REASON_PAGE_NOT_PRESENT:
      
      I'm also having problems with the page fault tracing on bare
      metal (same symptom of no trace output).  I'm unsure if it's
      related.
      
      Steven had an alternative to this which has zero overhead when
      tracing is off where this includes the standard noops even when
      tracing is disabled.  I'm unconvinced that the extra complexity
      of his apporach:
      
      	http://lkml.kernel.org/r/20140508194508.561ed220@gandalf.local.home
      
      is worth it, expecially considering that the KVM code is already
      making page fault entry slower here.  This solution is
      dirt-simple.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86@kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      65a7f03f
    • P
      KVM: x86: get CPL from SS.DPL · ae9fedc7
      Paolo Bonzini 提交于
      CS.RPL is not equal to the CPL in the few instructions between
      setting CR0.PE and reloading CS.  And CS.DPL is also not equal
      to the CPL for conforming code segments.
      
      However, SS.DPL *is* always equal to the CPL except for the weird
      case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
      value in the STAR MSR, but force CPL=3 (Intel instead forces
      SS.DPL=SS.RPL=CPL=3).
      
      So this patch:
      
      - modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
      the above case with SYSRET is not broken further, and the way
      to fix it would be to pass the CPL to userspace and back
      
      - modifies VMX to always return the CPL from SS.DPL (except
      forcing it to 0 if we are emulating real mode via vm86 mode;
      in vm86 mode all DPLs have to be 3, but real mode does allow
      privileged instructions).  It also removes the CPL cache,
      which becomes a duplicate of the SS access rights cache.
      
      This fixes doing KVM_IOCTL_SET_SREGS exactly after setting
      CR0.PE=1 but before CS has been reloaded.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ae9fedc7
    • P
      KVM: x86: check CS.DPL against RPL during task switch · 5045b468
      Paolo Bonzini 提交于
      Table 7-1 of the SDM mentions a check that the code segment's
      DPL must match the selector's RPL.  This was not done by KVM,
      fix it.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5045b468
    • P
      KVM: x86: drop set_rflags callback · fb5e336b
      Paolo Bonzini 提交于
      Not needed anymore now that the CPL is computed directly
      during task switch.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fb5e336b
    • P
      KVM: x86: use new CS.RPL as CPL during task switch · 2356aaeb
      Paolo Bonzini 提交于
      During task switch, all of CS.DPL, CS.RPL, SS.DPL must match (in addition
      to all the other requirements) and will be the new CPL.  So far this
      worked by carefully setting the CS selector and flag before doing the
      task switch; setting CS.selector will already change the CPL.
      
      However, this will not work once we get the CPL from SS.DPL, because
      then you will have to set the full segment descriptor cache to change
      the CPL.  ctxt->ops->cpl(ctxt) will then return the old CPL during the
      task switch, and the check that SS.DPL == CPL will fail.
      
      Temporarily assume that the CPL comes from CS.RPL during task switch
      to a protected-mode task.  This is the same approach used in QEMU's
      emulation code, which (until version 2.0) manually tracks the CPL.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2356aaeb
  3. 17 5月, 2014 1 次提交
    • P
      Merge tag 'kvm-s390-20140516' of... · afa538f0
      Paolo Bonzini 提交于
      Merge tag 'kvm-s390-20140516' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next
      
      1. Correct locking for lazy storage key handling
         A test loop with multiple CPUs triggered a race in the lazy storage
         key handling as introduced by commit 934bc131
         (KVM: s390: Allow skeys to be enabled for the current process). This
         race should not happen with Linux guests, but let's fix it anyway.
         Patch touches !/kvm/ code, but is from the s390 maintainer.
      
      2. Better handling of broken guests
         If we detect a program check loop we stop the guest instead of
         wasting CPU cycles.
      
      3. Better handling on MVPG emulation
         The move page handling is improved to be architecturally correct.
      
      3. Trace point rework
         Let's rework the kvm trace points to have a common header file (for
         later perf usage) and provided a table based instruction decoder.
      
      4. Interpretive execution of SIGP external call
         Let the hardware handle most cases of SIGP external call (IPI) and
         wire up the fixup code for the corner cases.
      
      5. Initial preparations for the IBC facility
         Prepare the code to handle instruction blocking
      afa538f0
  4. 16 5月, 2014 12 次提交
  5. 13 5月, 2014 1 次提交
  6. 08 5月, 2014 1 次提交
    • G
      kvm: x86: emulate monitor and mwait instructions as nop · 87c00572
      Gabriel L. Somlo 提交于
      Treat monitor and mwait instructions as nop, which is architecturally
      correct (but inefficient) behavior. We do this to prevent misbehaving
      guests (e.g. OS X <= 10.7) from crashing after they fail to check for
      monitor/mwait availability via cpuid.
      
      Since mwait-based idle loops relying on these nop-emulated instructions
      would keep the host CPU pegged at 100%, do NOT advertise their presence
      via cpuid, to prevent compliant guests from using them inadvertently.
      Signed-off-by: NGabriel L. Somlo <somlo@cmu.edu>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      87c00572