1. 06 11月, 2015 1 次提交
    • P
      KVM: PPC: Book3S HV: Don't dynamically split core when already split · f74f2e2e
      Paul Mackerras 提交于
      In static micro-threading modes, the dynamic micro-threading code
      is supposed to be disabled, because subcores can't make independent
      decisions about what micro-threading mode to put the core in - there is
      only one micro-threading mode for the whole core.  The code that
      implements dynamic micro-threading checks for this, except that the
      check was missed in one case.  This means that it is possible for a
      subcore in static 2-way micro-threading mode to try to put the core
      into 4-way micro-threading mode, which usually leads to stuck CPUs,
      spinlock lockups, and other stalls in the host.
      
      The problem was in the can_split_piggybacked_subcores() function, which
      should always return false if the system is in a static micro-threading
      mode.  This fixes the problem by making can_split_piggybacked_subcores()
      use subcore_config_ok() for its checks, as subcore_config_ok() includes
      the necessary check for the static micro-threading modes.
      
      Credit to Gautham Shenoy for working out that the reason for the hangs
      and stalls we were seeing was that we were trying to do dynamic 4-way
      micro-threading while we were in static 2-way mode.
      
      Fixes: b4deba5c
      Cc: vger@stable.kernel.org # v4.3
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      f74f2e2e
  2. 21 10月, 2015 1 次提交
    • P
      powerpc: Revert "Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8" · 23316316
      Paul Mackerras 提交于
      This reverts commit 9678cdaa ("Use the POWER8 Micro Partition
      Prefetch Engine in KVM HV on POWER8") because the original commit had
      multiple, partly self-cancelling bugs, that could cause occasional
      memory corruption.
      
      In fact the logmpp instruction was incorrectly using register r0 as the
      source of the buffer address and operation code, and depending on what
      was in r0, it would either do nothing or corrupt the 64k page pointed to
      by r0.
      
      The logmpp instruction encoding and the operation code definitions could
      be corrected, but then there is the problem that there is no clearly
      defined way to know when the hardware has finished writing to the
      buffer.
      
      The original commit attempted to work around this by aborting the
      write-out before starting the prefetch, but this is ineffective in the
      case where the virtual core is now executing on a different physical
      core from the one where the write-out was initiated.
      
      These problems plus advice from the hardware designers not to use the
      function (since the measured performance improvement from using the
      feature was actually mostly negative), mean that reverting the code is
      the best option.
      
      Fixes: 9678cdaa ("Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8")
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      23316316
  3. 21 9月, 2015 1 次提交
    • P
      KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs · 5fc3e64f
      Paul Mackerras 提交于
      This fixes a bug which results in stale vcore pointers being left in
      the per-cpu preempted vcore lists when a VM is destroyed.  The result
      of the stale vcore pointers is usually either a crash or a lockup
      inside collect_piggybacks() when another VM is run.  A typical
      lockup message looks like:
      
      [  472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [qemu-system-ppc:7039]
      [  472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ses enclosure shpchp rtc_opal i2c_opal powernv_rng binfmt_misc dm_service_time scsi_dh_alua radeon i2c_algo_bit drm_kms_helper ttm drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio dm_multipath [last unloaded: kvm_hv]
      [  472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ #49
      [  472.162187] task: c000001e38512750 ti: c000001e41bfc000 task.ti: c000001e41bfc000
      [  472.162262] NIP: c00000000096b094 LR: c00000000096b08c CTR: c000000000111130
      [  472.162337] REGS: c000001e41bff520 TRAP: 0901   Not tainted  (4.2.0-kvm+)
      [  472.162399] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24848844  XER: 00000000
      [  472.162588] CFAR: c00000000096b0ac SOFTE: 1
      GPR00: c000000000111170 c000001e41bff7a0 c00000000127df00 0000000000000001
      GPR04: 0000000000000003 0000000000000001 0000000000000000 0000000000874821
      GPR08: c000001e41bff8e0 0000000000000001 0000000000000000 d00000000efde740
      GPR12: c000000000111130 c00000000fdae400
      [  472.163053] NIP [c00000000096b094] _raw_spin_lock_irqsave+0xa4/0x130
      [  472.163117] LR [c00000000096b08c] _raw_spin_lock_irqsave+0x9c/0x130
      [  472.163179] Call Trace:
      [  472.163206] [c000001e41bff7a0] [c000001e41bff7f0] 0xc000001e41bff7f0 (unreliable)
      [  472.163295] [c000001e41bff7e0] [c000000000111170] __wake_up+0x40/0x90
      [  472.163375] [c000001e41bff830] [d00000000efd6fc0] kvmppc_run_core+0x1240/0x1950 [kvm_hv]
      [  472.163465] [c000001e41bffa30] [d00000000efd8510] kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv]
      [  472.163559] [c000001e41bffb70] [d00000000e9318a4] kvmppc_vcpu_run+0x44/0x60 [kvm]
      [  472.163653] [c000001e41bffba0] [d00000000e92e674] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
      [  472.163745] [c000001e41bffbe0] [d00000000e9263a8] kvm_vcpu_ioctl+0x538/0x7b0 [kvm]
      [  472.163834] [c000001e41bffd40] [c0000000002d0f50] do_vfs_ioctl+0x480/0x7c0
      [  472.163910] [c000001e41bffde0] [c0000000002d1364] SyS_ioctl+0xd4/0xf0
      [  472.163986] [c000001e41bffe30] [c000000000009260] system_call+0x38/0xd0
      [  472.164060] Instruction dump:
      [  472.164098] ebc1fff0 ebe1fff8 7c0803a6 4e800020 60000000 60000000 60420000 8bad02e2
      [  472.164224] 7fc3f378 4b6a57c1 60000000 7c210b78 <e92d0000> 89290009 792affe3 40820070
      
      The bug is that kvmppc_run_vcpu does not correctly handle the case
      where a vcpu task receives a signal while its guest vcpu is executing
      in the guest as a result of being piggy-backed onto the execution of
      another vcore.  In that case we need to wait for the vcpu to finish
      executing inside the guest, and then remove this vcore from the
      preempted vcores list.  That way, we avoid leaving this vcpu's vcore
      on the preempted vcores list when the vcpu gets interrupted.
      
      Fixes: ec257165Reported-by: NThomas Huth <thuth@redhat.com>
      Tested-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      5fc3e64f
  4. 03 9月, 2015 1 次提交
    • G
      KVM: PPC: Book3S HV: Fix race in starting secondary threads · 7f235328
      Gautham R. Shenoy 提交于
      The current dynamic micro-threading code has a race due to which a
      secondary thread naps when it is supposed to be running a vcpu. As a
      side effect of this, on a guest exit, the primary thread in
      kvmppc_wait_for_nap() finds that this secondary thread hasn't cleared
      its vcore pointer. This results in "CPU X seems to be stuck!"
      warnings.
      
      The race is possible since the primary thread on exiting the guests
      only waits for all the secondaries to clear its vcore pointer. It
      subsequently expects the secondary threads to enter nap while it
      unsplits the core. A secondary thread which hasn't yet entered the nap
      will loop in kvm_no_guest until its vcore pointer and the do_nap flag
      are unset. Once the core has been unsplit, a new vcpu thread can grab
      the core and set the do_nap flag *before* setting the vcore pointers
      of the secondary. As a result, the secondary thread will now enter nap
      via kvm_unsplit_nap instead of running the guest vcpu.
      
      Fix this by setting the do_nap flag after setting the vcore pointer in
      the PACA of the secondary in kvmppc_run_core. Also, ensure that a
      secondary thread doesn't nap in kvm_unsplit_nap when the vcore pointer
      in its PACA struct is set.
      
      Fixes: b4deba5cSigned-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      7f235328
  5. 22 8月, 2015 5 次提交
    • P
      KVM: PPC: Book3S HV: Fix preempted vcore stolen time calculation · 563a1e93
      Paul Mackerras 提交于
      Whenever a vcore state is VCORE_PREEMPT we need to be counting stolen
      time for it.  This currently isn't the case when we have a vcore that
      no longer has any runnable threads in it but still has a runner task,
      so we do an explicit call to kvmppc_core_start_stolen() in that case.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      563a1e93
    • P
      KVM: PPC: Book3S HV: Fix preempted vcore list locking · 402813fe
      Paul Mackerras 提交于
      When a vcore gets preempted, we put it on the preempted vcore list for
      the current CPU.  The runner task then calls schedule() and comes back
      some time later and takes itself off the list.  We need to be careful
      to lock the list that it was put onto, which may not be the list for the
      current CPU since the runner task may have moved to another CPU.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      402813fe
    • P
      KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8 · b4deba5c
      Paul Mackerras 提交于
      This builds on the ability to run more than one vcore on a physical
      core by using the micro-threading (split-core) modes of the POWER8
      chip.  Previously, only vcores from the same VM could be run together,
      and (on POWER8) only if they had just one thread per core.  With the
      ability to split the core on guest entry and unsplit it on guest exit,
      we can run up to 8 vcpu threads from up to 4 different VMs, and we can
      run multiple vcores with 2 or 4 vcpus per vcore.
      
      Dynamic micro-threading is only available if the static configuration
      of the cores is whole-core mode (unsplit), and only on POWER8.
      
      To manage this, we introduce a new kvm_split_mode struct which is
      shared across all of the subcores in the core, with a pointer in the
      paca on each thread.  In addition we extend the core_info struct to
      have information on each subcore.  When deciding whether to add a
      vcore to the set already on the core, we now have two possibilities:
      (a) piggyback the vcore onto an existing subcore, or (b) start a new
      subcore.
      
      Currently, when any vcpu needs to exit the guest and switch to host
      virtual mode, we interrupt all the threads in all subcores and switch
      the core back to whole-core mode.  It may be possible in future to
      allow some of the subcores to keep executing in the guest while
      subcore 0 switches to the host, but that is not implemented in this
      patch.
      
      This adds a module parameter called dynamic_mt_modes which controls
      which micro-threading (split-core) modes the code will consider, as a
      bitmap.  In other words, if it is 0, no micro-threading mode is
      considered; if it is 2, only 2-way micro-threading is considered; if
      it is 4, only 4-way, and if it is 6, both 2-way and 4-way
      micro-threading mode will be considered.  The default is 6.
      
      With this, we now have secondary threads which are the primary thread
      for their subcore and therefore need to do the MMU switch.  These
      threads will need to be started even if they have no vcpu to run, so
      we use the vcore pointer in the PACA rather than the vcpu pointer to
      trigger them.
      
      It is now possible for thread 0 to find that an exit has been
      requested before it gets to switch the subcore state to the guest.  In
      that case we haven't added the guest's timebase offset to the
      timebase, so we need to be careful not to subtract the offset in the
      guest exit path.  In fact we just skip the whole path that switches
      back to host context, since we haven't switched to the guest context.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b4deba5c
    • P
      KVM: PPC: Book3S HV: Make use of unused threads when running guests · ec257165
      Paul Mackerras 提交于
      When running a virtual core of a guest that is configured with fewer
      threads per core than the physical cores have, the extra physical
      threads are currently unused.  This makes it possible to use them to
      run one or more other virtual cores from the same guest when certain
      conditions are met.  This applies on POWER7, and on POWER8 to guests
      with one thread per virtual core.  (It doesn't apply to POWER8 guests
      with multiple threads per vcore because they require a 1-1 virtual to
      physical thread mapping in order to be able to use msgsndp and the
      TIR.)
      
      The idea is that we maintain a list of preempted vcores for each
      physical cpu (i.e. each core, since the host runs single-threaded).
      Then, when a vcore is about to run, it checks to see if there are
      any vcores on the list for its physical cpu that could be
      piggybacked onto this vcore's execution.  If so, those additional
      vcores are put into state VCORE_PIGGYBACK and their runnable VCPU
      threads are started as well as the original vcore, which is called
      the master vcore.
      
      After the vcores have exited the guest, the extra ones are put back
      onto the preempted list if any of their VCPUs are still runnable and
      not idle.
      
      This means that vcpu->arch.ptid is no longer necessarily the same as
      the physical thread that the vcpu runs on.  In order to make it easier
      for code that wants to send an IPI to know which CPU to target, we
      now store that in a new field in struct vcpu_arch, called thread_cpu.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Tested-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ec257165
    • T
      KVM: PPC: Fix warnings from sparse · 5358a963
      Thomas Huth 提交于
      When compiling the KVM code for POWER with "make C=1", sparse
      complains about functions missing proper prototypes and a 64-bit
      constant missing the ULL prefix. Let's fix this by making the
      functions static or by including the proper header with the
      prototypes, and by appending a ULL prefix to the constant
      PPC_MPPE_ADDRESS_MASK.
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5358a963
  6. 03 8月, 2015 1 次提交
  7. 28 5月, 2015 1 次提交
  8. 26 5月, 2015 2 次提交
  9. 10 5月, 2015 1 次提交
    • P
      KVM: PPC: Book3S HV: Fix list traversal in error case · 17d48901
      Paul Mackerras 提交于
      This fixes a regression introduced in commit 25fedfca, "KVM: PPC:
      Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu", which
      leads to a user-triggerable oops.
      
      In the case where we try to run a vcore on a physical core that is
      not in single-threaded mode, or the vcore has too many threads for
      the physical core, we iterate the list of runnable vcpus to make
      each one return an EBUSY error to userspace.  Since this involves
      taking each vcpu off the runnable_threads list for the vcore, we
      need to use list_for_each_entry_safe rather than list_for_each_entry
      to traverse the list.  Otherwise the kernel will crash with an oops
      message like this:
      
      Unable to handle kernel paging request for data at address 0x000fff88
      Faulting instruction address: 0xd00000001e635dc8
      Oops: Kernel access of bad area, sig: 11 [#2]
      SMP NR_CPUS=1024 NUMA PowerNV
      ...
      CPU: 48 PID: 91256 Comm: qemu-system-ppc Tainted: G      D        3.18.0 #1
      task: c00000274e507500 ti: c0000027d1924000 task.ti: c0000027d1924000
      NIP: d00000001e635dc8 LR: d00000001e635df8 CTR: c00000000011ba50
      REGS: c0000027d19275b0 TRAP: 0300   Tainted: G      D         (3.18.0)
      MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 22002824  XER: 00000000
      CFAR: c000000000008468 DAR: 00000000000fff88 DSISR: 40000000 SOFTE: 1
      GPR00: d00000001e635df8 c0000027d1927830 d00000001e64c850 0000000000000001
      GPR04: 0000000000000001 0000000000000001 0000000000000000 0000000000000000
      GPR08: 0000000000200200 0000000000000000 0000000000000000 d00000001e63e588
      GPR12: 0000000000002200 c000000007dbc800 c000000fc7800000 000000000000000a
      GPR16: fffffffffffffffc c000000fd5439690 c000000fc7801c98 0000000000000001
      GPR20: 0000000000000003 c0000027d1927aa8 c000000fd543b348 c000000fd543b350
      GPR24: 0000000000000000 c000000fa57f0000 0000000000000030 0000000000000000
      GPR28: fffffffffffffff0 c000000fd543b328 00000000000fe468 c000000fd543b300
      NIP [d00000001e635dc8] kvmppc_run_core+0x198/0x17c0 [kvm_hv]
      LR [d00000001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv]
      Call Trace:
      [c0000027d1927830] [d00000001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv] (unreliable)
      [c0000027d1927a30] [d00000001e638350] kvmppc_vcpu_run_hv+0x5b0/0xdd0 [kvm_hv]
      [c0000027d1927b70] [d00000001e510504] kvmppc_vcpu_run+0x44/0x60 [kvm]
      [c0000027d1927ba0] [d00000001e50d4a4] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
      [c0000027d1927be0] [d00000001e504be8] kvm_vcpu_ioctl+0x5e8/0x7a0 [kvm]
      [c0000027d1927d40] [c0000000002d6720] do_vfs_ioctl+0x490/0x780
      [c0000027d1927de0] [c0000000002d6ae4] SyS_ioctl+0xd4/0xf0
      [c0000027d1927e30] [c000000000009358] syscall_exit+0x0/0x98
      Instruction dump:
      60000000 60420000 387e1b30 38800003 38a00001 38c00000 480087d9 e8410018
      ebde1c98 7fbdf040 3bdee368 419e0048 <813e1b20> 939e1b18 2f890001 409effcc
      ---[ end trace 8cdf50251cca6680 ]---
      
      Fixes: 25fedfcaSigned-off-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      17d48901
  10. 21 4月, 2015 9 次提交
    • P
      KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 · 66feed61
      Paul Mackerras 提交于
      This uses msgsnd where possible for signalling other threads within
      the same core on POWER8 systems, rather than IPIs through the XICS
      interrupt controller.  This includes waking secondary threads to run
      the guest, the interrupts generated by the virtual XICS, and the
      interrupts to bring the other threads out of the guest when exiting.
      
      Aggregated statistics from debugfs across vcpus for a guest with 32
      vcpus, 8 threads/vcore, running on a POWER8, show this before the
      change:
      
       rm_entry:     3387.6ns (228 - 86600, 1008969 samples)
        rm_exit:     4561.5ns (12 - 3477452, 1009402 samples)
        rm_intr:     1660.0ns (12 - 553050, 3600051 samples)
      
      and this after the change:
      
       rm_entry:     3060.1ns (212 - 65138, 953873 samples)
        rm_exit:     4244.1ns (12 - 9693408, 954331 samples)
        rm_intr:     1342.3ns (12 - 1104718, 3405326 samples)
      
      for a test of booting Fedora 20 big-endian to the login prompt.
      
      The time taken for a H_PROD hcall (which is handled in the host
      kernel) went down from about 35 microseconds to about 16 microseconds
      with this change.
      
      The noinline added to kvmppc_run_core turned out to be necessary for
      good performance, at least with gcc 4.9.2 as packaged with Fedora 21
      and a little-endian POWER8 host.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      66feed61
    • P
      KVM: PPC: Book3S HV: Use bitmap of active threads rather than count · 7d6c40da
      Paul Mackerras 提交于
      Currently, the entry_exit_count field in the kvmppc_vcore struct
      contains two 8-bit counts, one of the threads that have started entering
      the guest, and one of the threads that have started exiting the guest.
      This changes it to an entry_exit_map field which contains two bitmaps
      of 8 bits each.  The advantage of doing this is that it gives us a
      bitmap of which threads need to be signalled when exiting the guest.
      That means that we no longer need to use the trick of setting the
      HDEC to 0 to pull the other threads out of the guest, which led in
      some cases to a spurious HDEC interrupt on the next guest entry.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7d6c40da
    • P
      KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken · 5d5b99cd
      Paul Mackerras 提交于
      We can tell when a secondary thread has finished running a guest by
      the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
      is no real need for the nap_count field in the kvmppc_vcore struct.
      This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
      pointers of the secondary threads rather than polling vc->nap_count.
      Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
      this also means that we can tell which secondary threads have got
      stuck and thus print a more informative error message.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5d5b99cd
    • P
      KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu · 25fedfca
      Paul Mackerras 提交于
      Rather than calling cond_resched() in kvmppc_run_core() before doing
      the post-processing for the vcpus that we have just run (that is,
      calling kvmppc_handle_exit_hv(), kvmppc_set_timer(), etc.), we now do
      that post-processing before calling cond_resched(), and that post-
      processing is moved out into its own function, post_guest_process().
      
      The reschedule point is now in kvmppc_run_vcpu() and we define a new
      vcore state, VCORE_PREEMPT, to indicate that that the vcore's runner
      task is runnable but not running.  (Doing the reschedule with the
      vcore in VCORE_INACTIVE state would be bad because there are potentially
      other vcpus waiting for the runner in kvmppc_wait_for_exec() which
      then wouldn't get woken up.)
      
      Also, we make use of the handy cond_resched_lock() function, which
      unlocks and relocks vc->lock for us around the reschedule.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      25fedfca
    • P
      KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update · d911f0be
      Paul Mackerras 提交于
      Previously, if kvmppc_run_core() was running a VCPU that needed a VPA
      update (i.e. one of its 3 virtual processor areas needed to be pinned
      in memory so the host real mode code can update it on guest entry and
      exit), we would drop the vcore lock and do the update there and then.
      Future changes will make it inconvenient to drop the lock, so instead
      we now remove it from the list of runnable VCPUs and wake up its
      VCPU task.  This will have the effect that the VCPU task will exit
      kvmppc_run_vcpu(), go around the do loop in kvmppc_vcpu_run_hv(), and
      re-enter kvmppc_run_vcpu(), whereupon it will do the necessary call
      to kvmppc_update_vpas() and then rejoin the vcore.
      
      The one complication is that the runner VCPU (whose VCPU task is the
      current task) might be one of the ones that gets removed from the
      runnable list.  In that case we just return from kvmppc_run_core()
      and let the code in kvmppc_run_vcpu() wake up another VCPU task to be
      the runner if necessary.
      
      This all means that the VCORE_STARTING state is no longer used, so we
      remove it.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      d911f0be
    • P
      KVM: PPC: Book3S HV: Accumulate timing information for real-mode code · b6c295df
      Paul Mackerras 提交于
      This reads the timebase at various points in the real-mode guest
      entry/exit code and uses that to accumulate total, minimum and
      maximum time spent in those parts of the code.  Currently these
      times are accumulated per vcpu in 5 parts of the code:
      
      * rm_entry - time taken from the start of kvmppc_hv_entry() until
        just before entering the guest.
      * rm_intr - time from when we take a hypervisor interrupt in the
        guest until we either re-enter the guest or decide to exit to the
        host.  This includes time spent handling hcalls in real mode.
      * rm_exit - time from when we decide to exit the guest until the
        return from kvmppc_hv_entry().
      * guest - time spend in the guest
      * cede - time spent napping in real mode due to an H_CEDE hcall
        while other threads in the same vcore are active.
      
      These times are exposed in debugfs in a directory per vcpu that
      contains a file called "timings".  This file contains one line for
      each of the 5 timings above, with the name followed by a colon and
      4 numbers, which are the count (number of times the code has been
      executed), the total time, the minimum time, and the maximum time,
      all in nanoseconds.
      
      The overhead of the extra code amounts to about 30ns for an hcall that
      is handled in real mode (e.g. H_SET_DABR), which is about 25%.  Since
      production environments may not wish to incur this overhead, the new
      code is conditional on a new config symbol,
      CONFIG_KVM_BOOK3S_HV_EXIT_TIMING.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b6c295df
    • P
      KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT · e23a808b
      Paul Mackerras 提交于
      This creates a debugfs directory for each HV guest (assuming debugfs
      is enabled in the kernel config), and within that directory, a file
      by which the contents of the guest's HPT (hashed page table) can be
      read.  The directory is named vmnnnn, where nnnn is the PID of the
      process that created the guest.  The file is named "htab".  This is
      intended to help in debugging problems in the host's management
      of guest memory.
      
      The contents of the file consist of a series of lines like this:
      
        3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196
      
      The first field is the index of the entry in the HPT, the second and
      third are the HPT entry, so the third entry contains the real page
      number that is mapped by the entry if the entry's valid bit is set.
      The fourth field is the guest's view of the second doubleword of the
      entry, so it contains the guest physical address.  (The format of the
      second through fourth fields are described in the Power ISA and also
      in arch/powerpc/include/asm/mmu-hash64.h.)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e23a808b
    • A
      KVM: PPC: Book3S HV: Remove RMA-related variables from code · 31037eca
      Aneesh Kumar K.V 提交于
      We don't support real-mode areas now that 970 support is removed.
      Remove the remaining details of rma from the code.  Also rename
      rma_setup_done to hpte_setup_done to better reflect the changes.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      31037eca
    • D
      kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM · 99342cf8
      David Gibson 提交于
      On POWER, storage caching is usually configured via the MMU - attributes
      such as cache-inhibited are stored in the TLB and the hashed page table.
      
      This makes correctly performing cache inhibited IO accesses awkward when
      the MMU is turned off (real mode).  Some CPU models provide special
      registers to control the cache attributes of real mode load and stores but
      this is not at all consistent.  This is a problem in particular for SLOF,
      the firmware used on KVM guests, which runs entirely in real mode, but
      which needs to do IO to load the kernel.
      
      To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD
      and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to
      a logical address (aka guest physical address).  SLOF uses these for IO.
      
      However, because these are implemented within qemu, not the host kernel,
      these bypass any IO devices emulated within KVM itself.  The simplest way
      to see this problem is to attempt to boot a KVM guest from a virtio-blk
      device with iothread / dataplane enabled.  The iothread code relies on an
      in kernel implementation of the virtio queue notification, which is not
      triggered by the IO hcalls, and so the guest will stall in SLOF unable to
      load the guest OS.
      
      This patch addresses this by providing in-kernel implementations of the
      2 hypercalls, which correctly scan the KVM IO bus.  Any access to an
      address not handled by the KVM IO bus will cause a VM exit, hitting the
      qemu implementation as before.
      
      Note that a userspace change is also required, in order to enable these
      new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      [agraf: fix compilation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      99342cf8
  11. 20 3月, 2015 2 次提交
    • P
      KVM: PPC: Book3S HV: Endian fix for accessing VPA yield count · ecb6d618
      Paul Mackerras 提交于
      The VPA (virtual processor area) is defined by PAPR and is therefore
      big-endian, so we need a be32_to_cpu when reading it in
      kvmppc_get_yield_count().  Without this, H_CONFER always fails on a
      little-endian host, causing SMP guests to waste time spinning on
      spinlocks.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ecb6d618
    • P
      KVM: PPC: Book3S HV: Fix spinlock/mutex ordering issue in kvmppc_set_lpcr() · 8f902b00
      Paul Mackerras 提交于
      Currently, kvmppc_set_lpcr() has a spinlock around the whole function,
      and inside that does mutex_lock(&kvm->lock).  It is not permitted to
      take a mutex while holding a spinlock, because the mutex_lock might
      call schedule().  In addition, this causes lockdep to warn about a
      lock ordering issue:
      
      ======================================================
      [ INFO: possible circular locking dependency detected ]
      3.18.0-kvm-04645-gdfea862-dirty #131 Not tainted
      -------------------------------------------------------
      qemu-system-ppc/8179 is trying to acquire lock:
       (&kvm->lock){+.+.+.}, at: [<d00000000ecc1f54>] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
      
      but task is already holding lock:
       (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecc1ea0>] .kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv]
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&(&vcore->lock)->rlock){+.+...}:
             [<c000000000b3c120>] .mutex_lock_nested+0x80/0x570
             [<d00000000ecc7a14>] .kvmppc_vcpu_run_hv+0xc4/0xe40 [kvm_hv]
             [<d00000000eb9f5cc>] .kvmppc_vcpu_run+0x2c/0x40 [kvm]
             [<d00000000eb9cb24>] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm]
             [<d00000000eb94478>] .kvm_vcpu_ioctl+0x4a8/0x7b0 [kvm]
             [<c00000000026cbb4>] .do_vfs_ioctl+0x444/0x770
             [<c00000000026cfa4>] .SyS_ioctl+0xc4/0xe0
             [<c000000000009264>] syscall_exit+0x0/0x98
      
      -> #0 (&kvm->lock){+.+.+.}:
             [<c0000000000ff28c>] .lock_acquire+0xcc/0x1a0
             [<c000000000b3c120>] .mutex_lock_nested+0x80/0x570
             [<d00000000ecc1f54>] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
             [<d00000000ecc510c>] .kvmppc_set_one_reg_hv+0x4dc/0x990 [kvm_hv]
             [<d00000000eb9f234>] .kvmppc_set_one_reg+0x44/0x330 [kvm]
             [<d00000000eb9c9dc>] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 [kvm]
             [<d00000000eb9ced4>] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm]
             [<d00000000eb940b0>] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm]
             [<c00000000026cbb4>] .do_vfs_ioctl+0x444/0x770
             [<c00000000026cfa4>] .SyS_ioctl+0xc4/0xe0
             [<c000000000009264>] syscall_exit+0x0/0x98
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&(&vcore->lock)->rlock);
                                     lock(&kvm->lock);
                                     lock(&(&vcore->lock)->rlock);
        lock(&kvm->lock);
      
       *** DEADLOCK ***
      
      2 locks held by qemu-system-ppc/8179:
       #0:  (&vcpu->mutex){+.+.+.}, at: [<d00000000eb93f18>] .vcpu_load+0x28/0x90 [kvm]
       #1:  (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecc1ea0>] .kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv]
      
      stack backtrace:
      CPU: 4 PID: 8179 Comm: qemu-system-ppc Not tainted 3.18.0-kvm-04645-gdfea862-dirty #131
      Call Trace:
      [c000001a66c0f310] [c000000000b486ac] .dump_stack+0x88/0xb4 (unreliable)
      [c000001a66c0f390] [c0000000000f8bec] .print_circular_bug+0x27c/0x3d0
      [c000001a66c0f440] [c0000000000fe9e8] .__lock_acquire+0x2028/0x2190
      [c000001a66c0f5d0] [c0000000000ff28c] .lock_acquire+0xcc/0x1a0
      [c000001a66c0f6a0] [c000000000b3c120] .mutex_lock_nested+0x80/0x570
      [c000001a66c0f7c0] [d00000000ecc1f54] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
      [c000001a66c0f860] [d00000000ecc510c] .kvmppc_set_one_reg_hv+0x4dc/0x990 [kvm_hv]
      [c000001a66c0f8d0] [d00000000eb9f234] .kvmppc_set_one_reg+0x44/0x330 [kvm]
      [c000001a66c0f960] [d00000000eb9c9dc] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 [kvm]
      [c000001a66c0f9f0] [d00000000eb9ced4] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm]
      [c000001a66c0faf0] [d00000000eb940b0] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm]
      [c000001a66c0fcb0] [c00000000026cbb4] .do_vfs_ioctl+0x444/0x770
      [c000001a66c0fd90] [c00000000026cfa4] .SyS_ioctl+0xc4/0xe0
      [c000001a66c0fe30] [c000000000009264] syscall_exit+0x0/0x98
      
      This fixes it by moving the mutex_lock()/mutex_unlock() pair outside
      the spin-locked region.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8f902b00
  12. 17 12月, 2014 5 次提交
    • S
      KVM: PPC: Book3S HV: Improve H_CONFER implementation · 90fd09f8
      Sam Bobroff 提交于
      Currently the H_CONFER hcall is implemented in kernel virtual mode,
      meaning that whenever a guest thread does an H_CONFER, all the threads
      in that virtual core have to exit the guest.  This is bad for
      performance because it interrupts the other threads even if they
      are doing useful work.
      
      The H_CONFER hcall is called by a guest VCPU when it is spinning on a
      spinlock and it detects that the spinlock is held by a guest VCPU that
      is currently not running on a physical CPU.  The idea is to give this
      VCPU's time slice to the holder VCPU so that it can make progress
      towards releasing the lock.
      
      To avoid having the other threads exit the guest unnecessarily,
      we add a real-mode implementation of H_CONFER that checks whether
      the other threads are doing anything.  If all the other threads
      are idle (i.e. in H_CEDE) or trying to confer (i.e. in H_CONFER),
      it returns H_TOO_HARD which causes a guest exit and allows the
      H_CONFER to be handled in virtual mode.
      
      Otherwise it spins for a short time (up to 10 microseconds) to give
      other threads the chance to observe that this thread is trying to
      confer.  The spin loop also terminates when any thread exits the guest
      or when all other threads are idle or trying to confer.  If the
      timeout is reached, the H_CONFER returns H_SUCCESS.  In this case the
      guest VCPU will recheck the spinlock word and most likely call
      H_CONFER again.
      
      This also improves the implementation of the H_CONFER virtual mode
      handler.  If the VCPU is part of a virtual core (vcore) which is
      runnable, there will be a 'runner' VCPU which has taken responsibility
      for running the vcore.  In this case we yield to the runner VCPU
      rather than the target VCPU.
      
      We also introduce a check on the target VCPU's yield count: if it
      differs from the yield count passed to H_CONFER, the target VCPU
      has run since H_CONFER was called and may have already released
      the lock.  This check is required by PAPR.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      90fd09f8
    • P
      KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register · 4a157d61
      Paul Mackerras 提交于
      There are two ways in which a guest instruction can be obtained from
      the guest in the guest exit code in book3s_hv_rmhandlers.S.  If the
      exit was caused by a Hypervisor Emulation interrupt (i.e. an illegal
      instruction), the offending instruction is in the HEIR register
      (Hypervisor Emulation Instruction Register).  If the exit was caused
      by a load or store to an emulated MMIO device, we load the instruction
      from the guest by turning data relocation on and loading the instruction
      with an lwz instruction.
      
      Unfortunately, in the case where the guest has opposite endianness to
      the host, these two methods give results of different endianness, but
      both get put into vcpu->arch.last_inst.  The HEIR value has been loaded
      using guest endianness, whereas the lwz will load the instruction using
      host endianness.  The rest of the code that uses vcpu->arch.last_inst
      assumes it was loaded using host endianness.
      
      To fix this, we define a new vcpu field to store the HEIR value.  Then,
      in kvmppc_handle_exit_hv(), we transfer the value from this new field to
      vcpu->arch.last_inst, doing a byte-swap if the guest and host endianness
      differ.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4a157d61
    • P
      KVM: PPC: Book3S HV: Remove code for PPC970 processors · c17b98cf
      Paul Mackerras 提交于
      This removes the code that was added to enable HV KVM to work
      on PPC970 processors.  The PPC970 is an old CPU that doesn't
      support virtualizing guest memory.  Removing PPC970 support also
      lets us remove the code for allocating and managing contiguous
      real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
      case, the code for pinning pages of guest memory when first
      accessed and keeping track of which pages have been pinned, and
      the code for handling H_ENTER hypercalls in virtual mode.
      
      Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
      The KVM_CAP_PPC_RMA capability now always returns 0.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c17b98cf
    • S
      KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions · 3c78f78a
      Suresh E. Warrier 提交于
      This patch adds trace points in the guest entry and exit code and also
      for exceptions handled by the host in kernel mode - hypercalls and page
      faults. The new events are added to /sys/kernel/debug/tracing/events
      under a new subsystem called kvm_hv.
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3c78f78a
    • P
      KVM: PPC: Book3S HV: Simplify locking around stolen time calculations · 2711e248
      Paul Mackerras 提交于
      Currently the calculations of stolen time for PPC Book3S HV guests
      uses fields in both the vcpu struct and the kvmppc_vcore struct.  The
      fields in the kvmppc_vcore struct are protected by the
      vcpu->arch.tbacct_lock of the vcpu that has taken responsibility for
      running the virtual core.  This works correctly but confuses lockdep,
      because it sees that the code takes the tbacct_lock for a vcpu in
      kvmppc_remove_runnable() and then takes another vcpu's tbacct_lock in
      vcore_stolen_time(), and it thinks there is a possibility of deadlock,
      causing it to print reports like this:
      
      =============================================
      [ INFO: possible recursive locking detected ]
      3.18.0-rc7-kvm-00016-g8db4bc6 #89 Not tainted
      ---------------------------------------------
      qemu-system-ppc/6188 is trying to acquire lock:
       (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb1fe8>] .vcore_stolen_time+0x48/0xd0 [kvm_hv]
      
      but task is already holding lock:
       (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&vcpu->arch.tbacct_lock)->rlock);
        lock(&(&vcpu->arch.tbacct_lock)->rlock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      3 locks held by qemu-system-ppc/6188:
       #0:  (&vcpu->mutex){+.+.+.}, at: [<d00000000eb93f98>] .vcpu_load+0x28/0xe0 [kvm]
       #1:  (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecb41b0>] .kvmppc_vcpu_run_hv+0x530/0x1530 [kvm_hv]
       #2:  (&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv]
      
      stack backtrace:
      CPU: 40 PID: 6188 Comm: qemu-system-ppc Not tainted 3.18.0-rc7-kvm-00016-g8db4bc6 #89
      Call Trace:
      [c000000b2754f3f0] [c000000000b31b6c] .dump_stack+0x88/0xb4 (unreliable)
      [c000000b2754f470] [c0000000000faeb8] .__lock_acquire+0x1878/0x2190
      [c000000b2754f600] [c0000000000fbf0c] .lock_acquire+0xcc/0x1a0
      [c000000b2754f6d0] [c000000000b2954c] ._raw_spin_lock_irq+0x4c/0x70
      [c000000b2754f760] [d00000000ecb1fe8] .vcore_stolen_time+0x48/0xd0 [kvm_hv]
      [c000000b2754f7f0] [d00000000ecb25b4] .kvmppc_remove_runnable.part.3+0x44/0xd0 [kvm_hv]
      [c000000b2754f880] [d00000000ecb43ec] .kvmppc_vcpu_run_hv+0x76c/0x1530 [kvm_hv]
      [c000000b2754f9f0] [d00000000eb9f46c] .kvmppc_vcpu_run+0x2c/0x40 [kvm]
      [c000000b2754fa60] [d00000000eb9c9a4] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm]
      [c000000b2754faf0] [d00000000eb94538] .kvm_vcpu_ioctl+0x498/0x760 [kvm]
      [c000000b2754fcb0] [c000000000267eb4] .do_vfs_ioctl+0x444/0x770
      [c000000b2754fd90] [c0000000002682a4] .SyS_ioctl+0xc4/0xe0
      [c000000b2754fe30] [c0000000000092e4] syscall_exit+0x0/0x98
      
      In order to make the locking easier to analyse, we change the code to
      use a spinlock in the kvmppc_vcore struct to protect the stolen_tb and
      preempt_tb fields.  This lock needs to be an irq-safe lock since it is
      used in the kvmppc_core_vcpu_load_hv() and kvmppc_core_vcpu_put_hv()
      functions, which are called with the scheduler rq lock held, which is
      an irq-safe lock.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      2711e248
  13. 15 12月, 2014 2 次提交
    • S
      KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked · 1bc5d59c
      Suresh E. Warrier 提交于
      The kvmppc_vcore_blocked() code does not check for the wait condition
      after putting the process on the wait queue. This means that it is
      possible for an external interrupt to become pending, but the vcpu to
      remain asleep until the next decrementer interrupt.  The fix is to
      make one last check for pending exceptions and ceded state before
      calling schedule().
      Signed-off-by: NSuresh Warrier <warrier@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1bc5d59c
    • M
      KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI · dee6f24c
      Mahesh Salgaonkar 提交于
      When we get an HMI (hypervisor maintenance interrupt) while in a
      guest, we see that guest enters into paused state.  The reason is, in
      kvmppc_handle_exit_hv it falls through default path and returns to
      host instead of resuming guest.  This causes guest to enter into
      paused state.  HMI is a hypervisor only interrupt and it is safe to
      resume the guest since the host has handled it already.  This patch
      adds a switch case to resume the guest.
      
      Without this patch we see guest entering into paused state with following
      console messages:
      
      [ 3003.329351] Severe Hypervisor Maintenance interrupt [Recovered]
      [ 3003.329356]  Error detail: Timer facility experienced an error
      [ 3003.329359] 	HMER: 0840000000000000
      [ 3003.329360] 	TFMR: 4a12000980a84000
      [ 3003.329366] vcpu c0000007c35094c0 (40):
      [ 3003.329368] pc  = c0000000000c2ba0  msr = 8000000000009032  trap = e60
      [ 3003.329370] r 0 = c00000000021ddc0  r16 = 0000000000000046
      [ 3003.329372] r 1 = c00000007a02bbd0  r17 = 00003ffff27d5d98
      [ 3003.329375] r 2 = c0000000010980b8  r18 = 00001fffffc9a0b0
      [ 3003.329377] r 3 = c00000000142d6b8  r19 = c00000000142d6b8
      [ 3003.329379] r 4 = 0000000000000002  r20 = 0000000000000000
      [ 3003.329381] r 5 = c00000000524a110  r21 = 0000000000000000
      [ 3003.329383] r 6 = 0000000000000001  r22 = 0000000000000000
      [ 3003.329386] r 7 = 0000000000000000  r23 = c00000000524a110
      [ 3003.329388] r 8 = 0000000000000000  r24 = 0000000000000001
      [ 3003.329391] r 9 = 0000000000000001  r25 = c00000007c31da38
      [ 3003.329393] r10 = c0000000014280b8  r26 = 0000000000000002
      [ 3003.329395] r11 = 746f6f6c2f68656c  r27 = c00000000524a110
      [ 3003.329397] r12 = 0000000028004484  r28 = c00000007c31da38
      [ 3003.329399] r13 = c00000000fe01400  r29 = 0000000000000002
      [ 3003.329401] r14 = 0000000000000046  r30 = c000000003011e00
      [ 3003.329403] r15 = ffffffffffffffba  r31 = 0000000000000002
      [ 3003.329404] ctr = c00000000041a670  lr  = c000000000272520
      [ 3003.329405] srr0 = c00000000007e8d8 srr1 = 9000000000001002
      [ 3003.329406] sprg0 = 0000000000000000 sprg1 = c00000000fe01400
      [ 3003.329407] sprg2 = c00000000fe01400 sprg3 = 0000000000000005
      [ 3003.329408] cr = 48004482  xer = 2000000000000000  dsisr = 42000000
      [ 3003.329409] dar = 0000010015020048
      [ 3003.329410] fault dar = 0000010015020048 dsisr = 42000000
      [ 3003.329411] SLB (8 entries):
      [ 3003.329412]   ESID = c000000008000000 VSID = 40016e7779000510
      [ 3003.329413]   ESID = d000000008000001 VSID = 400142add1000510
      [ 3003.329414]   ESID = f000000008000004 VSID = 4000eb1a81000510
      [ 3003.329415]   ESID = 00001f000800000b VSID = 40004fda0a000d90
      [ 3003.329416]   ESID = 00003f000800000c VSID = 400039f536000d90
      [ 3003.329417]   ESID = 000000001800000d VSID = 0001251b35150d90
      [ 3003.329417]   ESID = 000001000800000e VSID = 4001e46090000d90
      [ 3003.329418]   ESID = d000080008000019 VSID = 40013d349c000400
      [ 3003.329419] lpcr = c048800001847001 sdr1 = 0000001b19000006 last_inst = ffffffff
      [ 3003.329421] trap=0xe60 | pc=0xc0000000000c2ba0 | msr=0x8000000000009032
      [ 3003.329524] Severe Hypervisor Maintenance interrupt [Recovered]
      [ 3003.329526]  Error detail: Timer facility experienced an error
      [ 3003.329527] 	HMER: 0840000000000000
      [ 3003.329527] 	TFMR: 4a12000980a94000
      [ 3006.359786] Severe Hypervisor Maintenance interrupt [Recovered]
      [ 3006.359792]  Error detail: Timer facility experienced an error
      [ 3006.359795] 	HMER: 0840000000000000
      [ 3006.359797] 	TFMR: 4a12000980a84000
      
       Id    Name                           State
      ----------------------------------------------------
       2     guest2                         running
       3     guest3                         paused
       4     guest4                         running
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      dee6f24c
  14. 22 9月, 2014 3 次提交
  15. 28 7月, 2014 5 次提交
    • A
      KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page · 63fff5c1
      Aneesh Kumar K.V 提交于
      When calculating the lower bits of AVA field, use the shift
      count based on the base page size. Also add the missing segment
      size and remove stale comment.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      63fff5c1
    • S
      Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8 · 9678cdaa
      Stewart Smith 提交于
      The POWER8 processor has a Micro Partition Prefetch Engine, which is
      a fancy way of saying "has way to store and load contents of L2 or
      L2+MRU way of L3 cache". We initiate the storing of the log (list of
      addresses) using the logmpp instruction and start restore by writing
      to a SPR.
      
      The logmpp instruction takes parameters in a single 64bit register:
      - starting address of the table to store log of L2/L2+L3 cache contents
        - 32kb for L2
        - 128kb for L2+L3
        - Aligned relative to maximum size of the table (32kb or 128kb)
      - Log control (no-op, L2 only, L2 and L3, abort logout)
      
      We should abort any ongoing logging before initiating one.
      
      To initiate restore, we write to the MPPR SPR. The format of what to write
      to the SPR is similar to the logmpp instruction parameter:
      - starting address of the table to read from (same alignment requirements)
      - table size (no data, until end of table)
      - prefetch rate (from fastest possible to slower. about every 8, 16, 24 or
        32 cycles)
      
      The idea behind loading and storing the contents of L2/L3 cache is to
      reduce memory latency in a system that is frequently swapping vcores on
      a physical CPU.
      
      The best case scenario for doing this is when some vcores are doing very
      cache heavy workloads. The worst case is when they have about 0 cache hits,
      so we just generate needless memory operations.
      
      This implementation just does L2 store/load. In my benchmarks this proves
      to be useful.
      
      Benchmark 1:
       - 16 core POWER8
       - 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each
       - No split core/SMT
       - two guests running sysbench memory test.
         sysbench --test=memory --num-threads=8 run
       - one guest running apache bench (of default HTML page)
         ab -n 490000 -c 400 http://localhost/
      
      This benchmark aims to measure performance of real world application (apache)
      where other guests are cache hot with their own workloads. The sysbench memory
      benchmark does pointer sized writes to a (small) memory buffer in a loop.
      
      In this benchmark with this patch I can see an improvement both in requests
      per second (~5%) and in mean and median response times (again, about 5%).
      The spread of minimum and maximum response times were largely unchanged.
      
      benchmark 2:
       - Same VM config as benchmark 1
       - all three guests running sysbench memory benchmark
      
      This benchmark aims to see if there is a positive or negative affect to this
      cache heavy benchmark. Although due to the nature of the benchmark (stores) we
      may not see a difference in performance, but rather hopefully an improvement
      in consistency of performance (when vcore switched in, don't have to wait
      many times for cachelines to be pulled in)
      
      The results of this benchmark are improvements in consistency of performance
      rather than performance itself. With this patch, the few outliers in duration
      go away and we get more consistent performance in each guest.
      
      benchmark 3:
       - same 3 guests and CPU configuration as benchmark 1 and 2.
       - two idle guests
       - 1 guest running STREAM benchmark
      
      This scenario also saw performance improvement with this patch. On Copy and
      Scale workloads from STREAM, I got 5-6% improvement with this patch. For
      Add and triad, it was around 10% (or more).
      
      benchmark 4:
       - same 3 guests as previous benchmarks
       - two guests running sysbench --memory, distinctly different cache heavy
         workload
       - one guest running STREAM benchmark.
      
      Similar improvements to benchmark 3.
      
      benchmark 5:
       - 1 guest, 8 VCPUs, Ubuntu 14.04
       - Host configured with split core (SMT8, subcores-per-core=4)
       - STREAM benchmark
      
      In this benchmark, we see a 10-20% performance improvement across the board
      of STREAM benchmark results with this patch.
      
      Based on preliminary investigation and microbenchmarks
      by Prerna Saxena <prerna@linux.vnet.ibm.com>
      Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9678cdaa
    • S
      Split out struct kvmppc_vcore creation to separate function · de9bdd1a
      Stewart Smith 提交于
      No code changes, just split it out to a function so that with the addition
      of micro partition prefetch buffer allocation (in subsequent patch) looks
      neater and doesn't require excessive indentation.
      Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      de9bdd1a
    • A
      KVM: PPC: Book3S: Fix LPCR one_reg interface · a0840240
      Alexey Kardashevskiy 提交于
      Unfortunately, the LPCR got defined as a 32-bit register in the
      one_reg interface.  This is unfortunate because KVM allows userspace
      to control the DPFD (default prefetch depth) field, which is in the
      upper 32 bits.  The result is that DPFD always get set to 0, which
      reduces performance in the guest.
      
      We can't just change KVM_REG_PPC_LPCR to be a 64-bit register ID,
      since that would break existing userspace binaries.  Instead we define
      a new KVM_REG_PPC_LPCR_64 id which is 64-bit.  Userspace can still use
      the old KVM_REG_PPC_LPCR id, but it now only modifies those fields in
      the bottom 32 bits that userspace can modify (ILE, TC and AIL).
      If userspace uses the new KVM_REG_PPC_LPCR_64 id, it can modify DPFD
      as well.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a0840240
    • A
      KVM: PPC: Book3S HV: Access guest VPA in BE · 02407552
      Alexander Graf 提交于
      There are a few shared data structures between the host and the guest. Most
      of them get registered through the VPA interface.
      
      These data structures are defined to always be in big endian byte order, so
      let's make sure we always access them in big endian.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      02407552