1. 28 7月, 2014 7 次提交
    • A
      KVM: PPC: Book3S: Add hack for split real mode · c01e3f66
      Alexander Graf 提交于
      Today we handle split real mode by mapping both instruction and data faults
      into a special virtual address space that only exists during the split mode
      phase.
      
      This is good enough to catch 32bit Linux guests that use split real mode for
      copy_from/to_user. In this case we're always prefixed with 0xc0000000 for our
      instruction pointer and can map the user space process freely below there.
      
      However, that approach fails when we're running KVM inside of KVM. Here the 1st
      level last_inst reader may well be in the same virtual page as a 2nd level
      interrupt handler.
      
      It also fails when running Mac OS X guests. Here we have a 4G/4G split, so a
      kernel copy_from/to_user implementation can easily overlap with user space
      addresses.
      
      The architecturally correct way to fix this would be to implement an instruction
      interpreter in KVM that kicks in whenever we go into split real mode. This
      interpreter however would not receive a great amount of testing and be a lot of
      bloat for a reasonably isolated corner case.
      
      So I went back to the drawing board and tried to come up with a way to make
      split real mode work with a single flat address space. And then I realized that
      we could get away with the same trick that makes it work for Linux:
      
      Whenever we see an instruction address during split real mode that may collide,
      we just move it higher up the virtual address space to a place that hopefully
      does not collide (keep your fingers crossed!).
      
      That approach does work surprisingly well. I am able to successfully run
      Mac OS X guests with KVM and QEMU (no split real mode hacks like MOL) when I
      apply a tiny timing probe hack to QEMU. I'd say this is a win over even more
      broken split real mode :).
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c01e3f66
    • P
      KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled · ae2113a4
      Paul Mackerras 提交于
      This adds code to check that when the KVM_CAP_PPC_ENABLE_HCALL
      capability is used to enable or disable in-kernel handling of an
      hcall, that the hcall is actually implemented by the kernel.
      If not an EINVAL error is returned.
      
      This also checks the default-enabled list of hcalls and prints a
      warning if any hcall there is not actually implemented.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ae2113a4
    • P
      KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling · 699a0ea0
      Paul Mackerras 提交于
      This provides a way for userspace controls which sPAPR hcalls get
      handled in the kernel.  Each hcall can be individually enabled or
      disabled for in-kernel handling, except for H_RTAS.  The exception
      for H_RTAS is because userspace can already control whether
      individual RTAS functions are handled in-kernel or not via the
      KVM_PPC_RTAS_DEFINE_TOKEN ioctl, and because the numeric value for
      H_RTAS is out of the normal sequence of hcall numbers.
      
      Hcalls are enabled or disabled using the KVM_ENABLE_CAP ioctl for the
      KVM_CAP_PPC_ENABLE_HCALL capability on the file descriptor for the VM.
      The args field of the struct kvm_enable_cap specifies the hcall number
      in args[0] and the enable/disable flag in args[1]; 0 means disable
      in-kernel handling (so that the hcall will always cause an exit to
      userspace) and 1 means enable.  Enabling or disabling in-kernel
      handling of an hcall is effective across the whole VM.
      
      The ability for KVM_ENABLE_CAP to be used on a VM file descriptor
      on PowerPC is new, added by this commit.  The KVM_CAP_ENABLE_CAP_VM
      capability advertises that this ability exists.
      
      When a VM is created, an initial set of hcalls are enabled for
      in-kernel handling.  The set that is enabled is the set that have
      an in-kernel implementation at this point.  Any new hcall
      implementations from this point onwards should not be added to the
      default set without a good reason.
      
      No distinction is made between real-mode and virtual-mode hcall
      implementations; the one setting controls them both.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      699a0ea0
    • A
      KVM: PPC: Book3S PR: Handle hyp doorbell exits · 568fccc4
      Alexander Graf 提交于
      If we're running PR KVM in HV mode, we may get hypervisor doorbell interrupts.
      Handle those the same way we treat normal doorbells.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      568fccc4
    • A
      KVM: PPC: Book3s PR: Disable AIL mode with OPAL · fb4188ba
      Alexander Graf 提交于
      When we're using PR KVM we must not allow the CPU to take interrupts
      in virtual mode, as the SLB does not contain host kernel mappings
      when running inside the guest context.
      
      To make sure we get good performance for non-KVM tasks but still
      properly functioning PR KVM, let's just disable AIL whenever a vcpu
      is scheduled in.
      
      This is fundamentally different from how we deal with AIL on pSeries
      type machines where we disable AIL for the whole machine as soon as
      a single KVM VM is up.
      
      The reason for that is easy - on pSeries we do not have control over
      per-cpu configuration of AIL. We also don't want to mess with CPU hotplug
      races and AIL configuration, so setting it per CPU is easier and more
      flexible.
      
      This patch fixes running PR KVM on POWER8 bare metal for me.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      fb4188ba
    • A
      KVM: PPC: BOOK3S: PR: Emulate instruction counter · 06da28e7
      Aneesh Kumar K.V 提交于
      Writing to IC is not allowed in the privileged mode.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      06da28e7
    • A
      KVM: PPC: BOOK3S: PR: Emulate virtual timebase register · 8f42ab27
      Aneesh Kumar K.V 提交于
      virtual time base register is a per VM, per cpu register that needs
      to be saved and restored on vm exit and entry. Writing to VTB is not
      allowed in the privileged mode.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [agraf: fix compile error]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8f42ab27
  2. 06 7月, 2014 1 次提交
  3. 30 5月, 2014 8 次提交
    • A
      KVM: PPC: Book3S PR: Expose TM registers · 9916d57e
      Alexander Graf 提交于
      POWER8 introduces transactional memory which brings along a number of new
      registers and MSR bits.
      
      Implementing all of those is a pretty big headache, so for now let's at least
      emulate enough to make Linux's context switching code happy.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9916d57e
    • A
      KVM: PPC: Book3S PR: Expose TAR facility to guest · e14e7a1e
      Alexander Graf 提交于
      POWER8 implements a new register called TAR. This register has to be
      enabled in FSCR and then from KVM's point of view is mere storage.
      
      This patch enables the guest to use TAR.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e14e7a1e
    • A
      KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR · 616dff86
      Alexander Graf 提交于
      POWER8 introduced a new interrupt type called "Facility unavailable interrupt"
      which contains its status message in a new register called FSCR.
      
      Handle these exits and try to emulate instructions for unhandled facilities.
      Follow-on patches enable KVM to expose specific facilities into the guest.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      616dff86
    • A
      KVM: PPC: Book3S PR: Do dcbz32 patching with big endian instructions · cd087eef
      Alexander Graf 提交于
      When the host CPU we're running on doesn't support dcbz32 itself, but the
      guest wants to have dcbz only clear 32 bytes of data, we loop through every
      executable mapped page to search for dcbz instructions and patch them with
      a special privileged instruction that we emulate as dcbz32.
      
      The only guests that want to see dcbz act as 32byte are book3s_32 guests, so
      we don't have to worry about little endian instruction ordering. So let's
      just always search for big endian dcbz instructions, also when we're on a
      little endian host.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      cd087eef
    • A
      KVM: PPC: Make shared struct aka magic page guest endian · 5deb8e7a
      Alexander Graf 提交于
      The shared (magic) page is a data structure that contains often used
      supervisor privileged SPRs accessible via memory to the user to reduce
      the number of exits we have to take to read/write them.
      
      When we actually share this structure with the guest we have to maintain
      it in guest endianness, because some of the patch tricks only work with
      native endian load/store operations.
      
      Since we only share the structure with either host or guest in little
      endian on book3s_64 pr mode, we don't have to worry about booke or book3s hv.
      
      For booke, the shared struct stays big endian. For book3s_64 hv we maintain
      the struct in host native endian, since it never gets shared with the guest.
      
      For book3s_64 pr we introduce a variable that tells us which endianness the
      shared struct is in and route every access to it through helper inline
      functions that evaluate this variable.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5deb8e7a
    • A
      KVM: PPC: Book3S PR: Default to big endian guest · 94810ba4
      Alexander Graf 提交于
      The default MSR when user space does not define anything should be identical
      on little and big endian hosts, so remove MSR_LE from it.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      94810ba4
    • A
      KVM: PPC: BOOK3S: PR: Fix WARN_ON with debug options on · 7562c4fd
      Aneesh Kumar K.V 提交于
      With debug option "sleep inside atomic section checking" enabled we get
      the below WARN_ON during a PR KVM boot. This is because upstream now
      have PREEMPT_COUNT enabled even if we have preempt disabled. Fix the
      warning by adding preempt_disable/enable around floating point and altivec
      enable.
      
      WARNING: at arch/powerpc/kernel/process.c:156
      Modules linked in: kvm_pr kvm
      CPU: 1 PID: 3990 Comm: qemu-system-ppc Tainted: G        W     3.15.0-rc1+ #4
      task: c0000000eb85b3a0 ti: c0000000ec59c000 task.ti: c0000000ec59c000
      NIP: c000000000015c84 LR: d000000003334644 CTR: c000000000015c00
      REGS: c0000000ec59f140 TRAP: 0700   Tainted: G        W      (3.15.0-rc1+)
      MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 42000024  XER: 20000000
      CFAR: c000000000015c24 SOFTE: 1
      GPR00: d000000003334644 c0000000ec59f3c0 c000000000e2fa40 c0000000e2f80000
      GPR04: 0000000000000800 0000000000002000 0000000000000001 8000000000000000
      GPR08: 0000000000000001 0000000000000001 0000000000002000 c000000000015c00
      GPR12: d00000000333da18 c00000000fb80900 0000000000000000 0000000000000000
      GPR16: 0000000000000000 0000000000000000 0000000000000000 00003fffce4e0fa1
      GPR20: 0000000000000010 0000000000000001 0000000000000002 00000000100b9a38
      GPR24: 0000000000000002 0000000000000000 0000000000000000 0000000000000013
      GPR28: 0000000000000000 c0000000eb85b3a0 0000000000002000 c0000000e2f80000
      NIP [c000000000015c84] .enable_kernel_fp+0x84/0x90
      LR [d000000003334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr]
      Call Trace:
      [c0000000ec59f3c0] [0000000000000010] 0x10 (unreliable)
      [c0000000ec59f430] [d000000003334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr]
      [c0000000ec59f4c0] [d00000000324b380] .kvmppc_set_msr+0x30/0x50 [kvm]
      [c0000000ec59f530] [d000000003337cac] .kvmppc_core_emulate_op_pr+0x16c/0x5e0 [kvm_pr]
      [c0000000ec59f5f0] [d00000000324a944] .kvmppc_emulate_instruction+0x284/0xa80 [kvm]
      [c0000000ec59f6c0] [d000000003336888] .kvmppc_handle_exit_pr+0x488/0xb70 [kvm_pr]
      [c0000000ec59f790] [d000000003338d34] kvm_start_lightweight+0xcc/0xdc [kvm_pr]
      [c0000000ec59f960] [d000000003336288] .kvmppc_vcpu_run_pr+0xc8/0x190 [kvm_pr]
      [c0000000ec59f9f0] [d00000000324c880] .kvmppc_vcpu_run+0x30/0x50 [kvm]
      [c0000000ec59fa60] [d000000003249e74] .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [kvm]
      [c0000000ec59faf0] [d000000003244948] .kvm_vcpu_ioctl+0x478/0x760 [kvm]
      [c0000000ec59fcb0] [c000000000224e34] .do_vfs_ioctl+0x4d4/0x790
      [c0000000ec59fd90] [c000000000225148] .SyS_ioctl+0x58/0xb0
      [c0000000ec59fe30] [c00000000000a1e4] syscall_exit+0x0/0x98
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7562c4fd
    • A
      KVM: PPC: BOOK3S: PR: Enable Little Endian PR guest · e5ee5422
      Aneesh Kumar K.V 提交于
      This patch make sure we inherit the LE bit correctly in different case
      so that we can run Little Endian distro in PR mode
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e5ee5422
  4. 28 4月, 2014 1 次提交
  5. 27 1月, 2014 2 次提交
    • P
      KVM: PPC: Book3S PR: Cope with doorbell interrupts · 40688909
      Paul Mackerras 提交于
      When the PR host is running on a POWER8 machine in POWER8 mode, it
      will use doorbell interrupts for IPIs.  If one of them arrives while
      we are in the guest, we pop out of the guest with trap number 0xA00,
      which isn't handled by kvmppc_handle_exit_pr, leading to the following
      BUG_ON:
      
      [  331.436215] exit_nr=0xa00 | pc=0x1d2c | msr=0x800000000000d032
      [  331.437522] ------------[ cut here ]------------
      [  331.438296] kernel BUG at arch/powerpc/kvm/book3s_pr.c:982!
      [  331.439063] Oops: Exception in kernel mode, sig: 5 [#2]
      [  331.439819] SMP NR_CPUS=1024 NUMA pSeries
      [  331.440552] Modules linked in: tun nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw virtio_net kvm binfmt_misc ibmvscsi scsi_transport_srp scsi_tgt virtio_blk
      [  331.447614] CPU: 11 PID: 1296 Comm: qemu-system-ppc Tainted: G      D      3.11.7-200.2.fc19.ppc64p7 #1
      [  331.448920] task: c0000003bdc8c000 ti: c0000003bd32c000 task.ti: c0000003bd32c000
      [  331.450088] NIP: d0000000025d6b9c LR: d0000000025d6b98 CTR: c0000000004cfdd0
      [  331.451042] REGS: c0000003bd32f420 TRAP: 0700   Tainted: G      D       (3.11.7-200.2.fc19.ppc64p7)
      [  331.452331] MSR: 800000000282b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 28004824  XER: 20000000
      [  331.454616] SOFTE: 1
      [  331.455106] CFAR: c000000000848bb8
      [  331.455726]
      GPR00: d0000000025d6b98 c0000003bd32f6a0 d0000000026017b8 0000000000000032
      GPR04: c0000000018627f8 c000000001873208 320d0a3030303030 3030303030643033
      GPR08: c000000000c490a8 0000000000000000 0000000000000000 0000000000000002
      GPR12: 0000000028004822 c00000000fdc6300 0000000000000000 00000100076ec310
      GPR16: 000000002ae343b8 00003ffffd397398 0000000000000000 0000000000000000
      GPR20: 00000100076f16f4 00000100076ebe60 0000000000000008 ffffffffffffffff
      GPR24: 0000000000000000 0000008001041e60 0000000000000000 0000008001040ce8
      GPR28: c0000003a2d80000 0000000000000a00 0000000000000001 c0000003a2681810
      [  331.466504] NIP [d0000000025d6b9c] .kvmppc_handle_exit_pr+0x75c/0xa80 [kvm]
      [  331.466999] LR [d0000000025d6b98] .kvmppc_handle_exit_pr+0x758/0xa80 [kvm]
      [  331.467517] Call Trace:
      [  331.467909] [c0000003bd32f6a0] [d0000000025d6b98] .kvmppc_handle_exit_pr+0x758/0xa80 [kvm] (unreliable)
      [  331.468553] [c0000003bd32f750] [d0000000025d98f0] kvm_start_lightweight+0xb4/0xc4 [kvm]
      [  331.469189] [c0000003bd32f920] [d0000000025d7648] .kvmppc_vcpu_run_pr+0xd8/0x270 [kvm]
      [  331.469838] [c0000003bd32f9c0] [d0000000025cf748] .kvmppc_vcpu_run+0xc8/0xf0 [kvm]
      [  331.470790] [c0000003bd32fa50] [d0000000025cc19c] .kvm_arch_vcpu_ioctl_run+0x5c/0x1b0 [kvm]
      [  331.471401] [c0000003bd32fae0] [d0000000025c4888] .kvm_vcpu_ioctl+0x478/0x730 [kvm]
      [  331.472026] [c0000003bd32fc90] [c00000000026192c] .do_vfs_ioctl+0x4dc/0x7a0
      [  331.472561] [c0000003bd32fd80] [c000000000261cc4] .SyS_ioctl+0xd4/0xf0
      [  331.473095] [c0000003bd32fe30] [c000000000009ed8] syscall_exit+0x0/0x98
      [  331.473633] Instruction dump:
      [  331.473766] 4bfff9b4 2b9d0800 419efc18 60000000 60420000 3d220000 e8bf11a0 e8df12a8
      [  331.474733] 7fa4eb78 e8698660 48015165 e8410028 <0fe00000> 813f00e4 3ba00000 39290001
      [  331.475386] ---[ end trace 49fc47d994c1f8f2 ]---
      [  331.479817]
      
      This fixes the problem by making kvmppc_handle_exit_pr() recognize the
      interrupt.  We also need to jump to the doorbell interrupt handler in
      book3s_segment.S to handle the interrupt on the way out of the guest.
      Having done that, there's nothing further to be done in
      kvmppc_handle_exit_pr().
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      40688909
    • S
      kvm/ppc: IRQ disabling cleanup · 6c85f52b
      Scott Wood 提交于
      Simplify the handling of lazy EE by going directly from fully-enabled
      to hard-disabled.  This replaces the lazy_irq_pending() check
      (including its misplaced kvm_guest_exit() call).
      
      As suggested by Tiejun Chen, move the interrupt disabling into
      kvmppc_prepare_to_enter() rather than have each caller do it.  Also
      move the IRQ enabling on heavyweight exit into
      kvmppc_prepare_to_enter().
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      6c85f52b
  6. 09 1月, 2014 4 次提交
    • P
      KVM: PPC: Load/save FP/VMX/VSX state directly to/from vcpu struct · 99dae3ba
      Paul Mackerras 提交于
      Now that we have the vcpu floating-point and vector state stored in
      the same type of struct as the main kernel uses, we can load that
      state directly from the vcpu struct instead of having extra copies
      to/from the thread_struct.  Similarly, when the guest state needs to
      be saved, we can have it saved it directly to the vcpu struct by
      setting the current->thread.fp_save_area and current->thread.vr_save_area
      pointers.  That also means that we don't need to back up and restore
      userspace's FP/vector state.  This all makes the code simpler and
      faster.
      
      Note that it's not necessary to save or modify current->thread.fpexc_mode,
      since nothing in KVM uses or is affected by its value.  Nor is it
      necessary to touch used_vr or used_vsr.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      99dae3ba
    • P
      KVM: PPC: Store FP/VSX/VMX state in thread_fp/vr_state structures · efff1912
      Paul Mackerras 提交于
      This uses struct thread_fp_state and struct thread_vr_state to store
      the floating-point, VMX/Altivec and VSX state, rather than flat arrays.
      This makes transferring the state to/from the thread_struct simpler
      and allows us to unify the get/set_one_reg implementations for the
      VSX registers.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      efff1912
    • P
      KVM: PPC: Use load_fp/vr_state rather than load_up_fpu/altivec · 09548fda
      Paul Mackerras 提交于
      The load_up_fpu and load_up_altivec functions were never intended to
      be called from C, and do things like modifying the MSR value in their
      callers' stack frames, which are assumed to be interrupt frames.  In
      addition, on 32-bit Book S they require the MMU to be off.
      
      This makes KVM use the new load_fp_state() and load_vr_state() functions
      instead of load_up_fpu/altivec.  This means we can remove the assembler
      glue in book3s_rmhandlers.S, and potentially fixes a bug on Book E,
      where load_up_fpu was called directly from C.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      09548fda
    • A
      KVM: PPC: Add devname:kvm aliases for modules · 398a76c6
      Alexander Graf 提交于
      Systems that support automatic loading of kernel modules through
      device aliases should try and automatically load kvm when /dev/kvm
      gets opened.
      
      Add code to support that magic for all PPC kvm targets, even the
      ones that don't support modules yet.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      398a76c6
  7. 09 12月, 2013 1 次提交
    • A
      KVM: PPC: Book3S: PR: Make svcpu -> vcpu store preempt savvy · 40fdd8c8
      Alexander Graf 提交于
      As soon as we get back to our "highmem" handler in virtual address
      space we may get preempted. Today the reason we can get preempted is
      that we replay interrupts and all the lazy logic thinks we have
      interrupts enabled.
      
      However, it's not hard to make the code interruptible and that way
      we can enable and handle interrupts even earlier.
      
      This fixes random guest crashes that happened with CONFIG_PREEMPT=y
      for me.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      40fdd8c8
  8. 18 10月, 2013 2 次提交
  9. 17 10月, 2013 12 次提交
    • A
      kvm: powerpc: book3s: Support building HV and PR KVM as module · 2ba9f0d8
      Aneesh Kumar K.V 提交于
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [agraf: squash in compile fix]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      2ba9f0d8
    • A
      kvm: powerpc: book3s: pr: move PR related tracepoints to a separate header · 72c12535
      Aneesh Kumar K.V 提交于
      This patch moves PR related tracepoints to a separate header. This
      enables in converting PR to a kernel module which will be done in
      later patches
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      72c12535
    • A
      kvm: powerpc: book3s: Add is_hv_enabled to kvmppc_ops · 699cc876
      Aneesh Kumar K.V 提交于
      This help us to identify whether we are running with hypervisor mode KVM
      enabled. The change is needed so that we can have both HV and PR kvm
      enabled in the same kernel.
      
      If both HV and PR KVM are included, interrupts come in to the HV version
      of the kvmppc_interrupt code, which then jumps to the PR handler,
      renamed to kvmppc_interrupt_pr, if the guest is a PR guest.
      
      Allowing both PR and HV in the same kernel required some changes to
      kvm_dev_ioctl_check_extension(), since the values returned now can't
      be selected with #ifdefs as much as previously. We look at is_hv_enabled
      to return the right value when checking for capabilities.For capabilities that
      are only provided by HV KVM, we return the HV value only if
      is_hv_enabled is true. For capabilities provided by PR KVM but not HV,
      we return the PR value only if is_hv_enabled is false.
      
      NOTE: in later patch we replace is_hv_enabled with a static inline
      function comparing kvm_ppc_ops
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      699cc876
    • A
      kvm: powerpc: Add kvmppc_ops callback · 3a167bea
      Aneesh Kumar K.V 提交于
      This patch add a new callback kvmppc_ops. This will help us in enabling
      both HV and PR KVM together in the same kernel. The actual change to
      enable them together is done in the later patch in the series.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [agraf: squash in booke changes]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3a167bea
    • P
      KVM: PPC: Book3S PR: Reduce number of shadow PTEs invalidated by MMU notifiers · 491d6ecc
      Paul Mackerras 提交于
      Currently, whenever any of the MMU notifier callbacks get called, we
      invalidate all the shadow PTEs.  This is inefficient because it means
      that we typically then get a lot of DSIs and ISIs in the guest to fault
      the shadow PTEs back in.  We do this even if the address range being
      notified doesn't correspond to guest memory.
      
      This commit adds code to scan the memslot array to find out what range(s)
      of guest physical addresses corresponds to the host virtual address range
      being affected.  For each such range we flush only the shadow PTEs
      for the range, on all cpus.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      491d6ecc
    • P
      KVM: PPC: Book3S PR: Better handling of host-side read-only pages · 93b159b4
      Paul Mackerras 提交于
      Currently we request write access to all pages that get mapped into the
      guest, even if the guest is only loading from the page.  This reduces
      the effectiveness of KSM because it means that we unshare every page we
      access.  Also, we always set the changed (C) bit in the guest HPTE if
      it allows writing, even for a guest load.
      
      This fixes both these problems.  We pass an 'iswrite' flag to the
      mmu.xlate() functions and to kvmppc_mmu_map_page() to indicate whether
      the access is a load or a store.  The mmu.xlate() functions now only
      set C for stores.  kvmppc_gfn_to_pfn() now calls gfn_to_pfn_prot()
      instead of gfn_to_pfn() so that it can indicate whether we need write
      access to the page, and get back a 'writable' flag to indicate whether
      the page is writable or not.  If that 'writable' flag is clear, we then
      make the host HPTE read-only even if the guest HPTE allowed writing.
      
      This means that we can get a protection fault when the guest writes to a
      page that it has mapped read-write but which is read-only on the host
      side (perhaps due to KSM having merged the page).  Thus we now call
      kvmppc_handle_pagefault() for protection faults as well as HPTE not found
      faults.  In kvmppc_handle_pagefault(), if the access was allowed by the
      guest HPTE and we thus need to install a new host HPTE, we then need to
      remove the old host HPTE if there is one.  This is done with a new
      function, kvmppc_mmu_unmap_page(), which uses kvmppc_mmu_pte_vflush() to
      find and remove the old host HPTE.
      
      Since the memslot-related functions require the KVM SRCU read lock to
      be held, this adds srcu_read_lock/unlock pairs around the calls to
      kvmppc_handle_pagefault().
      
      Finally, this changes kvmppc_mmu_book3s_32_xlate_pte() to not ignore
      guest HPTEs that don't permit access, and to return -EPERM for accesses
      that are not permitted by the page protections.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      93b159b4
    • P
      KVM: PPC: Book3S PR: Allocate kvm_vcpu structs from kvm_vcpu_cache · 3ff95502
      Paul Mackerras 提交于
      This makes PR KVM allocate its kvm_vcpu structs from the kvm_vcpu_cache
      rather than having them embedded in the kvmppc_vcpu_book3s struct,
      which is allocated with vzalloc.  The reason is to reduce the
      differences between PR and HV KVM in order to make is easier to have
      them coexist in one kernel binary.
      
      With this, the kvm_vcpu struct has a pointer to the kvmppc_vcpu_book3s
      struct.  The pointer to the kvmppc_book3s_shadow_vcpu struct has moved
      from the kvmppc_vcpu_book3s struct to the kvm_vcpu struct, and is only
      present for 32-bit, since it is only used for 32-bit.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: squash in compile fix from Aneesh]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3ff95502
    • P
      KVM: PPC: Book3S PR: Make HPT accesses and updates SMP-safe · 9308ab8e
      Paul Mackerras 提交于
      This adds a per-VM mutex to provide mutual exclusion between vcpus
      for accesses to and updates of the guest hashed page table (HPT).
      This also makes the code use single-byte writes to the HPT entry
      when updating of the reference (R) and change (C) bits.  The reason
      for doing this, rather than writing back the whole HPTE, is that on
      non-PAPR virtual machines, the guest OS might be writing to the HPTE
      concurrently, and writing back the whole HPTE might conflict with
      that.  Also, real hardware does single-byte writes to update R and C.
      
      The new mutex is taken in kvmppc_mmu_book3s_64_xlate() when reading
      the HPT and updating R and/or C, and in the PAPR HPT update hcalls
      (H_ENTER, H_REMOVE, etc.).  Having the mutex means that we don't need
      to use a hypervisor lock bit in the HPT update hcalls, and we don't
      need to be careful about the order in which the bytes of the HPTE are
      updated by those hcalls.
      
      The other change here is to make emulated TLB invalidations (tlbie)
      effective across all vcpus.  To do this we call kvmppc_mmu_pte_vflush
      for all vcpus in kvmppc_ppc_book3s_64_tlbie().
      
      For 32-bit, this makes the setting of the accessed and dirty bits use
      single-byte writes, and makes tlbie invalidate shadow HPTEs for all
      vcpus.
      
      With this, PR KVM can successfully run SMP guests.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9308ab8e
    • P
      KVM: PPC: Book3S PR: Use 64k host pages where possible · c9029c34
      Paul Mackerras 提交于
      Currently, PR KVM uses 4k pages for the host-side mappings of guest
      memory, regardless of the host page size.  When the host page size is
      64kB, we might as well use 64k host page mappings for guest mappings
      of 64kB and larger pages and for guest real-mode mappings.  However,
      the magic page has to remain a 4k page.
      
      To implement this, we first add another flag bit to the guest VSID
      values we use, to indicate that this segment is one where host pages
      should be mapped using 64k pages.  For segments with this bit set
      we set the bits in the shadow SLB entry to indicate a 64k base page
      size.  When faulting in host HPTEs for this segment, we make them
      64k HPTEs instead of 4k.  We record the pagesize in struct hpte_cache
      for use when invalidating the HPTE.
      
      For now we restrict the segment containing the magic page (if any) to
      4k pages.  It should be possible to lift this restriction in future
      by ensuring that the magic 4k page is appropriately positioned within
      a host 64k page.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c9029c34
    • P
      KVM: PPC: Book3S PR: Allow guest to use 64k pages · a4a0f252
      Paul Mackerras 提交于
      This adds the code to interpret 64k HPTEs in the guest hashed page
      table (HPT), 64k SLB entries, and to tell the guest about 64k pages
      in kvm_vm_ioctl_get_smmu_info().  Guest 64k pages are still shadowed
      by 4k pages.
      
      This also adds another hash table to the four we have already in
      book3s_mmu_hpte.c to allow us to find all the PTEs that we have
      instantiated that match a given 64k guest page.
      
      The tlbie instruction changed starting with POWER6 to use a bit in
      the RB operand to indicate large page invalidations, and to use other
      RB bits to indicate the base and actual page sizes and the segment
      size.  64k pages came in slightly earlier, with POWER5++.
      We use one bit in vcpu->arch.hflags to indicate that the emulated
      cpu supports 64k pages, and another to indicate that it has the new
      tlbie definition.
      
      The KVM_PPC_GET_SMMU_INFO ioctl presents a bit of a problem, because
      the MMU capabilities depend on which CPU model we're emulating, but it
      is a VM ioctl not a VCPU ioctl and therefore doesn't get passed a VCPU
      fd.  In addition, commonly-used userspace (QEMU) calls it before
      setting the PVR for any VCPU.  Therefore, as a best effort we look at
      the first vcpu in the VM and return 64k pages or not depending on its
      capabilities.  We also make the PVR default to the host PVR on recent
      CPUs that support 1TB segments (and therefore multiple page sizes as
      well) so that KVM_PPC_GET_SMMU_INFO will include 64k page and 1TB
      segment support on those CPUs.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a4a0f252
    • P
      KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu · a2d56020
      Paul Mackerras 提交于
      Currently PR-style KVM keeps the volatile guest register values
      (R0 - R13, CR, LR, CTR, XER, PC) in a shadow_vcpu struct rather than
      the main kvm_vcpu struct.  For 64-bit, the shadow_vcpu exists in two
      places, a kmalloc'd struct and in the PACA, and it gets copied back
      and forth in kvmppc_core_vcpu_load/put(), because the real-mode code
      can't rely on being able to access the kmalloc'd struct.
      
      This changes the code to copy the volatile values into the shadow_vcpu
      as one of the last things done before entering the guest.  Similarly
      the values are copied back out of the shadow_vcpu to the kvm_vcpu
      immediately after exiting the guest.  We arrange for interrupts to be
      still disabled at this point so that we can't get preempted on 64-bit
      and end up copying values from the wrong PACA.
      
      This means that the accessor functions in kvm_book3s.h for these
      registers are greatly simplified, and are same between PR and HV KVM.
      In places where accesses to shadow_vcpu fields are now replaced by
      accesses to the kvm_vcpu, we can also remove the svcpu_get/put pairs.
      Finally, on 64-bit, we don't need the kmalloc'd struct at all any more.
      
      With this, the time to read the PVR one million times in a loop went
      from 567.7ms to 575.5ms (averages of 6 values), an increase of about
      1.4% for this worse-case test for guest entries and exits.  The
      standard deviation of the measurements is about 11ms, so the
      difference is only marginally significant statistically.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a2d56020
    • P
      KVM: PPC: Book3S PR: Fix compilation without CONFIG_ALTIVEC · f2481771
      Paul Mackerras 提交于
      Commit 9d1ffdd8 ("KVM: PPC: Book3S PR: Don't corrupt guest state
      when kernel uses VMX") added a call to kvmppc_load_up_altivec() that
      isn't guarded by CONFIG_ALTIVEC, causing a link failure when building
      a kernel without CONFIG_ALTIVEC set.  This adds an #ifdef to fix this.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      f2481771
  10. 11 10月, 2013 1 次提交
    • P
      powerpc: Put FP/VSX and VR state into structures · de79f7b9
      Paul Mackerras 提交于
      This creates new 'thread_fp_state' and 'thread_vr_state' structures
      to store FP/VSX state (including FPSCR) and Altivec/VSX state
      (including VSCR), and uses them in the thread_struct.  In the
      thread_fp_state, the FPRs and VSRs are represented as u64 rather
      than double, since we rarely perform floating-point computations
      on the values, and this will enable the structures to be used
      in KVM code as well.  Similarly FPSCR is now a u64 rather than
      a structure of two 32-bit values.
      
      This takes the offsets out of the macros such as SAVE_32FPRS,
      REST_32FPRS, etc.  This enables the same macros to be used for normal
      and transactional state, enabling us to delete the transactional
      versions of the macros.   This also removes the unused do_load_up_fpu
      and do_load_up_altivec, which were in fact buggy since they didn't
      create large enough stack frames to account for the fact that
      load_up_fpu and load_up_altivec are not designed to be called from C
      and assume that their caller's stack frame is an interrupt frame.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      de79f7b9
  11. 28 8月, 2013 1 次提交
    • P
      KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls · 8b23de29
      Paul Mackerras 提交于
      It turns out that if we exit the guest due to a hcall instruction (sc 1),
      and the loading of the instruction in the guest exit path fails for any
      reason, the call to kvmppc_ld() in kvmppc_get_last_inst() fetches the
      instruction after the hcall instruction rather than the hcall itself.
      This in turn means that the instruction doesn't get recognized as an
      hcall in kvmppc_handle_exit_pr() but gets passed to the guest kernel
      as a sc instruction.  That usually results in the guest kernel getting
      a return code of 38 (ENOSYS) from an hcall, which often triggers a
      BUG_ON() or other failure.
      
      This fixes the problem by adding a new variant of kvmppc_get_last_inst()
      called kvmppc_get_last_sc(), which fetches the instruction if necessary
      from pc - 4 rather than pc.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8b23de29