1. 21 9月, 2015 2 次提交
    • G
      KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit · 7e022e71
      Gautham R. Shenoy 提交于
      In guest_exit_cont we call kvmhv_commence_exit which expects the trap
      number as the argument. However r3 doesn't contain the trap number at
      this point and as a result we would be calling the function with a
      spurious trap number.
      
      Fix this by copying r12 into r3 before calling kvmhv_commence_exit as
      r12 contains the trap number.
      
      Cc: stable@vger.kernel.org # v4.1+
      Fixes: eddb60fbSigned-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      7e022e71
    • P
      KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs · 5fc3e64f
      Paul Mackerras 提交于
      This fixes a bug which results in stale vcore pointers being left in
      the per-cpu preempted vcore lists when a VM is destroyed.  The result
      of the stale vcore pointers is usually either a crash or a lockup
      inside collect_piggybacks() when another VM is run.  A typical
      lockup message looks like:
      
      [  472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [qemu-system-ppc:7039]
      [  472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ses enclosure shpchp rtc_opal i2c_opal powernv_rng binfmt_misc dm_service_time scsi_dh_alua radeon i2c_algo_bit drm_kms_helper ttm drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio dm_multipath [last unloaded: kvm_hv]
      [  472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ #49
      [  472.162187] task: c000001e38512750 ti: c000001e41bfc000 task.ti: c000001e41bfc000
      [  472.162262] NIP: c00000000096b094 LR: c00000000096b08c CTR: c000000000111130
      [  472.162337] REGS: c000001e41bff520 TRAP: 0901   Not tainted  (4.2.0-kvm+)
      [  472.162399] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24848844  XER: 00000000
      [  472.162588] CFAR: c00000000096b0ac SOFTE: 1
      GPR00: c000000000111170 c000001e41bff7a0 c00000000127df00 0000000000000001
      GPR04: 0000000000000003 0000000000000001 0000000000000000 0000000000874821
      GPR08: c000001e41bff8e0 0000000000000001 0000000000000000 d00000000efde740
      GPR12: c000000000111130 c00000000fdae400
      [  472.163053] NIP [c00000000096b094] _raw_spin_lock_irqsave+0xa4/0x130
      [  472.163117] LR [c00000000096b08c] _raw_spin_lock_irqsave+0x9c/0x130
      [  472.163179] Call Trace:
      [  472.163206] [c000001e41bff7a0] [c000001e41bff7f0] 0xc000001e41bff7f0 (unreliable)
      [  472.163295] [c000001e41bff7e0] [c000000000111170] __wake_up+0x40/0x90
      [  472.163375] [c000001e41bff830] [d00000000efd6fc0] kvmppc_run_core+0x1240/0x1950 [kvm_hv]
      [  472.163465] [c000001e41bffa30] [d00000000efd8510] kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv]
      [  472.163559] [c000001e41bffb70] [d00000000e9318a4] kvmppc_vcpu_run+0x44/0x60 [kvm]
      [  472.163653] [c000001e41bffba0] [d00000000e92e674] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
      [  472.163745] [c000001e41bffbe0] [d00000000e9263a8] kvm_vcpu_ioctl+0x538/0x7b0 [kvm]
      [  472.163834] [c000001e41bffd40] [c0000000002d0f50] do_vfs_ioctl+0x480/0x7c0
      [  472.163910] [c000001e41bffde0] [c0000000002d1364] SyS_ioctl+0xd4/0xf0
      [  472.163986] [c000001e41bffe30] [c000000000009260] system_call+0x38/0xd0
      [  472.164060] Instruction dump:
      [  472.164098] ebc1fff0 ebe1fff8 7c0803a6 4e800020 60000000 60000000 60420000 8bad02e2
      [  472.164224] 7fc3f378 4b6a57c1 60000000 7c210b78 <e92d0000> 89290009 792affe3 40820070
      
      The bug is that kvmppc_run_vcpu does not correctly handle the case
      where a vcpu task receives a signal while its guest vcpu is executing
      in the guest as a result of being piggy-backed onto the execution of
      another vcore.  In that case we need to wait for the vcpu to finish
      executing inside the guest, and then remove this vcore from the
      preempted vcores list.  That way, we avoid leaving this vcpu's vcore
      on the preempted vcores list when the vcpu gets interrupted.
      
      Fixes: ec257165Reported-by: NThomas Huth <thuth@redhat.com>
      Tested-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      5fc3e64f
  2. 16 9月, 2015 1 次提交
    • P
      KVM: add halt_attempted_poll to VCPU stats · 62bea5bf
      Paolo Bonzini 提交于
      This new statistic can help diagnosing VCPUs that, for any reason,
      trigger bad behavior of halt_poll_ns autotuning.
      
      For example, say halt_poll_ns = 480000, and wakeups are spaced exactly
      like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes
      10+20+40+80+160+320+480 = 1110 microseconds out of every
      479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then
      is consuming about 30% more CPU than it would use without
      polling.  This would show as an abnormally high number of
      attempted polling compared to the successful polls.
      
      Acked-by: Christian Borntraeger <borntraeger@de.ibm.com<
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      62bea5bf
  3. 04 9月, 2015 1 次提交
  4. 03 9月, 2015 2 次提交
    • G
      KVM: PPC: Book3S HV: Exit on H_DOORBELL if HOST_IPI is set · 06554d9f
      Gautham R. Shenoy 提交于
      The code that handles the case when we receive a H_DOORBELL interrupt
      has a comment which says "Hypervisor doorbell - exit only if host IPI
      flag set".  However, the current code does not actually check if the
      host IPI flag is set.  This is due to a comparison instruction that
      got missed.
      
      As a result, the current code performs the exit to host only
      if some sibling thread or a sibling sub-core is exiting to the
      host.  This implies that, an IPI sent to a sibling core in
      (subcores-per-core != 1) mode will be missed by the host unless the
      sibling core is on the exit path to the host.
      
      This patch adds the missing comparison operation which will ensure
      that when HOST_IPI flag is set, we unconditionally exit to the host.
      
      Fixes: 66feed61
      Cc: stable@vger.kernel.org # v4.1+
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      06554d9f
    • G
      KVM: PPC: Book3S HV: Fix race in starting secondary threads · 7f235328
      Gautham R. Shenoy 提交于
      The current dynamic micro-threading code has a race due to which a
      secondary thread naps when it is supposed to be running a vcpu. As a
      side effect of this, on a guest exit, the primary thread in
      kvmppc_wait_for_nap() finds that this secondary thread hasn't cleared
      its vcore pointer. This results in "CPU X seems to be stuck!"
      warnings.
      
      The race is possible since the primary thread on exiting the guests
      only waits for all the secondaries to clear its vcore pointer. It
      subsequently expects the secondary threads to enter nap while it
      unsplits the core. A secondary thread which hasn't yet entered the nap
      will loop in kvm_no_guest until its vcore pointer and the do_nap flag
      are unset. Once the core has been unsplit, a new vcpu thread can grab
      the core and set the do_nap flag *before* setting the vcore pointers
      of the secondary. As a result, the secondary thread will now enter nap
      via kvm_unsplit_nap instead of running the guest vcpu.
      
      Fix this by setting the do_nap flag after setting the vcore pointer in
      the PACA of the secondary in kvmppc_run_core. Also, ensure that a
      secondary thread doesn't nap in kvm_unsplit_nap when the vcore pointer
      in its PACA struct is set.
      
      Fixes: b4deba5cSigned-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      7f235328
  5. 22 8月, 2015 12 次提交
    • S
      KVM: PPC: Book3S: correct width in XER handling · c63517c2
      Sam bobroff 提交于
      In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64
      bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is
      accessed as such.
      
      This patch corrects places where it is accessed as a 32 bit field by a
      64 bit kernel.  In some cases this is via a 32 bit load or store
      instruction which, depending on endianness, will cause either the
      lower or upper 32 bits to be missed.  In another case it is cast as a
      u32, causing the upper 32 bits to be cleared.
      
      This patch corrects those places by extending the access methods to
      64 bits.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Tested-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c63517c2
    • P
      KVM: PPC: Book3S HV: Fix preempted vcore stolen time calculation · 563a1e93
      Paul Mackerras 提交于
      Whenever a vcore state is VCORE_PREEMPT we need to be counting stolen
      time for it.  This currently isn't the case when we have a vcore that
      no longer has any runnable threads in it but still has a runner task,
      so we do an explicit call to kvmppc_core_start_stolen() in that case.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      563a1e93
    • P
      KVM: PPC: Book3S HV: Fix preempted vcore list locking · 402813fe
      Paul Mackerras 提交于
      When a vcore gets preempted, we put it on the preempted vcore list for
      the current CPU.  The runner task then calls schedule() and comes back
      some time later and takes itself off the list.  We need to be careful
      to lock the list that it was put onto, which may not be the list for the
      current CPU since the runner task may have moved to another CPU.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      402813fe
    • P
      KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD · cdeee518
      Paul Mackerras 提交于
      This adds implementations for the H_CLEAR_REF (test and clear reference
      bit) and H_CLEAR_MOD (test and clear changed bit) hypercalls.
      
      When clearing the reference or change bit in the guest view of the HPTE,
      we also have to clear it in the real HPTE so that we can detect future
      references or changes.  When we do so, we transfer the R or C bit value
      to the rmap entry for the underlying host page so that kvm_age_hva_hv(),
      kvm_test_age_hva_hv() and kvmppc_hv_get_dirty_log() know that the page
      has been referenced and/or changed.
      
      These hypercalls are not used by Linux guests.  These implementations
      have been tested using a FreeBSD guest.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      cdeee518
    • P
      KVM: PPC: Book3S HV: Fix bug in dirty page tracking · 08fe1e7b
      Paul Mackerras 提交于
      This fixes a bug in the tracking of pages that get modified by the
      guest.  If the guest creates a large-page HPTE, writes to memory
      somewhere within the large page, and then removes the HPTE, we only
      record the modified state for the first normal page within the large
      page, when in fact the guest might have modified some other normal
      page within the large page.
      
      To fix this we use some unused bits in the rmap entry to record the
      order (log base 2) of the size of the page that was modified, when
      removing an HPTE.  Then in kvm_test_clear_dirty_npages() we use that
      order to return the correct number of modified pages.
      
      The same thing could in principle happen when removing a HPTE at the
      host's request, i.e. when paging out a page, except that we never
      page out large pages, and the guest can only create large-page HPTEs
      if the guest RAM is backed by large pages.  However, we also fix
      this case for the sake of future-proofing.
      
      The reference bit is also subject to the same loss of information.  We
      don't make the same fix here for the reference bit because there isn't
      an interface for userspace to find out which pages the guest has
      referenced, whereas there is one for userspace to find out which pages
      the guest has modified.  Because of this loss of information, the
      kvm_age_hva_hv() and kvm_test_age_hva_hv() functions might incorrectly
      say that a page has not been referenced when it has, but that doesn't
      matter greatly because we never page or swap out large pages.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      08fe1e7b
    • P
      KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE · 1e5bf454
      Paul Mackerras 提交于
      The reference (R) and change (C) bits in a HPT entry can be set by
      hardware at any time up until the HPTE is invalidated and the TLB
      invalidation sequence has completed.  This means that when removing
      a HPTE, we need to read the HPTE after the invalidation sequence has
      completed in order to obtain reliable values of R and C.  The code
      in kvmppc_do_h_remove() used to do this.  However, commit 6f22bd32
      ("KVM: PPC: Book3S HV: Make HTAB code LE host aware") removed the
      read after invalidation as a side effect of other changes.  This
      restores the read of the HPTE after invalidation.
      
      The user-visible effect of this bug would be that when migrating a
      guest, there is a small probability that a page modified by the guest
      and then unmapped by the guest might not get re-transmitted and thus
      the destination might end up with a stale copy of the page.
      
      Fixes: 6f22bd32Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1e5bf454
    • P
      KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8 · b4deba5c
      Paul Mackerras 提交于
      This builds on the ability to run more than one vcore on a physical
      core by using the micro-threading (split-core) modes of the POWER8
      chip.  Previously, only vcores from the same VM could be run together,
      and (on POWER8) only if they had just one thread per core.  With the
      ability to split the core on guest entry and unsplit it on guest exit,
      we can run up to 8 vcpu threads from up to 4 different VMs, and we can
      run multiple vcores with 2 or 4 vcpus per vcore.
      
      Dynamic micro-threading is only available if the static configuration
      of the cores is whole-core mode (unsplit), and only on POWER8.
      
      To manage this, we introduce a new kvm_split_mode struct which is
      shared across all of the subcores in the core, with a pointer in the
      paca on each thread.  In addition we extend the core_info struct to
      have information on each subcore.  When deciding whether to add a
      vcore to the set already on the core, we now have two possibilities:
      (a) piggyback the vcore onto an existing subcore, or (b) start a new
      subcore.
      
      Currently, when any vcpu needs to exit the guest and switch to host
      virtual mode, we interrupt all the threads in all subcores and switch
      the core back to whole-core mode.  It may be possible in future to
      allow some of the subcores to keep executing in the guest while
      subcore 0 switches to the host, but that is not implemented in this
      patch.
      
      This adds a module parameter called dynamic_mt_modes which controls
      which micro-threading (split-core) modes the code will consider, as a
      bitmap.  In other words, if it is 0, no micro-threading mode is
      considered; if it is 2, only 2-way micro-threading is considered; if
      it is 4, only 4-way, and if it is 6, both 2-way and 4-way
      micro-threading mode will be considered.  The default is 6.
      
      With this, we now have secondary threads which are the primary thread
      for their subcore and therefore need to do the MMU switch.  These
      threads will need to be started even if they have no vcpu to run, so
      we use the vcore pointer in the PACA rather than the vcpu pointer to
      trigger them.
      
      It is now possible for thread 0 to find that an exit has been
      requested before it gets to switch the subcore state to the guest.  In
      that case we haven't added the guest's timebase offset to the
      timebase, so we need to be careful not to subtract the offset in the
      guest exit path.  In fact we just skip the whole path that switches
      back to host context, since we haven't switched to the guest context.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b4deba5c
    • P
      KVM: PPC: Book3S HV: Make use of unused threads when running guests · ec257165
      Paul Mackerras 提交于
      When running a virtual core of a guest that is configured with fewer
      threads per core than the physical cores have, the extra physical
      threads are currently unused.  This makes it possible to use them to
      run one or more other virtual cores from the same guest when certain
      conditions are met.  This applies on POWER7, and on POWER8 to guests
      with one thread per virtual core.  (It doesn't apply to POWER8 guests
      with multiple threads per vcore because they require a 1-1 virtual to
      physical thread mapping in order to be able to use msgsndp and the
      TIR.)
      
      The idea is that we maintain a list of preempted vcores for each
      physical cpu (i.e. each core, since the host runs single-threaded).
      Then, when a vcore is about to run, it checks to see if there are
      any vcores on the list for its physical cpu that could be
      piggybacked onto this vcore's execution.  If so, those additional
      vcores are put into state VCORE_PIGGYBACK and their runnable VCPU
      threads are started as well as the original vcore, which is called
      the master vcore.
      
      After the vcores have exited the guest, the extra ones are put back
      onto the preempted list if any of their VCPUs are still runnable and
      not idle.
      
      This means that vcpu->arch.ptid is no longer necessarily the same as
      the physical thread that the vcpu runs on.  In order to make it easier
      for code that wants to send an IPI to know which CPU to target, we
      now store that in a new field in struct vcpu_arch, called thread_cpu.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Tested-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ec257165
    • T
      KVM: PPC: add missing pt_regs initialization · 845ac985
      Tudor Laurentiu 提交于
      On this switch branch the regs initialization
      doesn't happen so add it.
      This was found with the help of a static
      code analysis tool.
      Signed-off-by: NLaurentiu Tudor <Laurentiu.Tudor@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      845ac985
    • T
      KVM: PPC: Fix warnings from sparse · 5358a963
      Thomas Huth 提交于
      When compiling the KVM code for POWER with "make C=1", sparse
      complains about functions missing proper prototypes and a 64-bit
      constant missing the ULL prefix. Let's fix this by making the
      functions static or by including the proper header with the
      prototypes, and by appending a ULL prefix to the constant
      PPC_MPPE_ADDRESS_MASK.
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5358a963
    • T
      KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig · 129fd423
      Thomas Huth 提交于
      Since the PPC970 support has been removed from the kvm-hv kernel
      module recently, we should also reflect this change in the help
      text of the corresponding Kconfig option.
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      129fd423
    • T
      KVM: PPC: fix suspicious use of conditional operator · f5ffe330
      Tudor Laurentiu 提交于
      This was signaled by a static code analysis tool.
      Signed-off-by: NLaurentiu Tudor <Laurentiu.Tudor@freescale.com>
      Reviewed-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      f5ffe330
  6. 07 8月, 2015 1 次提交
  7. 03 8月, 2015 1 次提交
  8. 07 6月, 2015 1 次提交
  9. 28 5月, 2015 1 次提交
  10. 26 5月, 2015 2 次提交
  11. 10 5月, 2015 1 次提交
    • P
      KVM: PPC: Book3S HV: Fix list traversal in error case · 17d48901
      Paul Mackerras 提交于
      This fixes a regression introduced in commit 25fedfca, "KVM: PPC:
      Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu", which
      leads to a user-triggerable oops.
      
      In the case where we try to run a vcore on a physical core that is
      not in single-threaded mode, or the vcore has too many threads for
      the physical core, we iterate the list of runnable vcpus to make
      each one return an EBUSY error to userspace.  Since this involves
      taking each vcpu off the runnable_threads list for the vcore, we
      need to use list_for_each_entry_safe rather than list_for_each_entry
      to traverse the list.  Otherwise the kernel will crash with an oops
      message like this:
      
      Unable to handle kernel paging request for data at address 0x000fff88
      Faulting instruction address: 0xd00000001e635dc8
      Oops: Kernel access of bad area, sig: 11 [#2]
      SMP NR_CPUS=1024 NUMA PowerNV
      ...
      CPU: 48 PID: 91256 Comm: qemu-system-ppc Tainted: G      D        3.18.0 #1
      task: c00000274e507500 ti: c0000027d1924000 task.ti: c0000027d1924000
      NIP: d00000001e635dc8 LR: d00000001e635df8 CTR: c00000000011ba50
      REGS: c0000027d19275b0 TRAP: 0300   Tainted: G      D         (3.18.0)
      MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 22002824  XER: 00000000
      CFAR: c000000000008468 DAR: 00000000000fff88 DSISR: 40000000 SOFTE: 1
      GPR00: d00000001e635df8 c0000027d1927830 d00000001e64c850 0000000000000001
      GPR04: 0000000000000001 0000000000000001 0000000000000000 0000000000000000
      GPR08: 0000000000200200 0000000000000000 0000000000000000 d00000001e63e588
      GPR12: 0000000000002200 c000000007dbc800 c000000fc7800000 000000000000000a
      GPR16: fffffffffffffffc c000000fd5439690 c000000fc7801c98 0000000000000001
      GPR20: 0000000000000003 c0000027d1927aa8 c000000fd543b348 c000000fd543b350
      GPR24: 0000000000000000 c000000fa57f0000 0000000000000030 0000000000000000
      GPR28: fffffffffffffff0 c000000fd543b328 00000000000fe468 c000000fd543b300
      NIP [d00000001e635dc8] kvmppc_run_core+0x198/0x17c0 [kvm_hv]
      LR [d00000001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv]
      Call Trace:
      [c0000027d1927830] [d00000001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv] (unreliable)
      [c0000027d1927a30] [d00000001e638350] kvmppc_vcpu_run_hv+0x5b0/0xdd0 [kvm_hv]
      [c0000027d1927b70] [d00000001e510504] kvmppc_vcpu_run+0x44/0x60 [kvm]
      [c0000027d1927ba0] [d00000001e50d4a4] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
      [c0000027d1927be0] [d00000001e504be8] kvm_vcpu_ioctl+0x5e8/0x7a0 [kvm]
      [c0000027d1927d40] [c0000000002d6720] do_vfs_ioctl+0x490/0x780
      [c0000027d1927de0] [c0000000002d6ae4] SyS_ioctl+0xd4/0xf0
      [c0000027d1927e30] [c000000000009358] syscall_exit+0x0/0x98
      Instruction dump:
      60000000 60420000 387e1b30 38800003 38a00001 38c00000 480087d9 e8410018
      ebde1c98 7fbdf040 3bdee368 419e0048 <813e1b20> 939e1b18 2f890001 409effcc
      ---[ end trace 8cdf50251cca6680 ]---
      
      Fixes: 25fedfcaSigned-off-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      17d48901
  12. 07 5月, 2015 2 次提交
  13. 29 4月, 2015 1 次提交
  14. 21 4月, 2015 12 次提交
    • P
      KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 · 66feed61
      Paul Mackerras 提交于
      This uses msgsnd where possible for signalling other threads within
      the same core on POWER8 systems, rather than IPIs through the XICS
      interrupt controller.  This includes waking secondary threads to run
      the guest, the interrupts generated by the virtual XICS, and the
      interrupts to bring the other threads out of the guest when exiting.
      
      Aggregated statistics from debugfs across vcpus for a guest with 32
      vcpus, 8 threads/vcore, running on a POWER8, show this before the
      change:
      
       rm_entry:     3387.6ns (228 - 86600, 1008969 samples)
        rm_exit:     4561.5ns (12 - 3477452, 1009402 samples)
        rm_intr:     1660.0ns (12 - 553050, 3600051 samples)
      
      and this after the change:
      
       rm_entry:     3060.1ns (212 - 65138, 953873 samples)
        rm_exit:     4244.1ns (12 - 9693408, 954331 samples)
        rm_intr:     1342.3ns (12 - 1104718, 3405326 samples)
      
      for a test of booting Fedora 20 big-endian to the login prompt.
      
      The time taken for a H_PROD hcall (which is handled in the host
      kernel) went down from about 35 microseconds to about 16 microseconds
      with this change.
      
      The noinline added to kvmppc_run_core turned out to be necessary for
      good performance, at least with gcc 4.9.2 as packaged with Fedora 21
      and a little-endian POWER8 host.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      66feed61
    • P
      KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C · eddb60fb
      Paul Mackerras 提交于
      This replaces the assembler code for kvmhv_commence_exit() with C code
      in book3s_hv_builtin.c.  It also moves the IPI sending code that was
      in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function so it
      can be used by kvmhv_commence_exit() as well as icp_rm_set_vcpu_irq().
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      eddb60fb
    • P
      KVM: PPC: Book3S HV: Streamline guest entry and exit · 6af27c84
      Paul Mackerras 提交于
      On entry to the guest, secondary threads now wait for the primary to
      switch the MMU after loading up most of their state, rather than before.
      This means that the secondary threads get into the guest sooner, in the
      common case where the secondary threads get to kvmppc_hv_entry before
      the primary thread.
      
      On exit, the first thread out increments the exit count and interrupts
      the other threads (to get them out of the guest) before saving most
      of its state, rather than after.  That means that the other threads
      exit sooner and means that the first thread doesn't spend so much
      time waiting for the other threads at the point where the MMU gets
      switched back to the host.
      
      This pulls out the code that increments the exit count and interrupts
      other threads into a separate function, kvmhv_commence_exit().
      This also makes sure that r12 and vcpu->arch.trap are set correctly
      in some corner cases.
      
      Statistics from /sys/kernel/debug/kvm/vm*/vcpu*/timings show the
      improvement.  Aggregating across vcpus for a guest with 32 vcpus,
      8 threads/vcore, running on a POWER8, gives this before the change:
      
       rm_entry:     avg 4537.3ns (222 - 48444, 1068878 samples)
        rm_exit:     avg 4787.6ns (152 - 165490, 1010717 samples)
        rm_intr:     avg 1673.6ns (12 - 341304, 3818691 samples)
      
      and this after the change:
      
       rm_entry:     avg 3427.7ns (232 - 68150, 1118921 samples)
        rm_exit:     avg 4716.0ns (12 - 150720, 1119477 samples)
        rm_intr:     avg 1614.8ns (12 - 522436, 3850432 samples)
      
      showing a substantial reduction in the time spent per guest entry in
      the real-mode guest entry code, and smaller reductions in the real
      mode guest exit and interrupt handling times.  (The test was to start
      the guest and boot Fedora 20 big-endian to the login prompt.)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      6af27c84
    • P
      KVM: PPC: Book3S HV: Use bitmap of active threads rather than count · 7d6c40da
      Paul Mackerras 提交于
      Currently, the entry_exit_count field in the kvmppc_vcore struct
      contains two 8-bit counts, one of the threads that have started entering
      the guest, and one of the threads that have started exiting the guest.
      This changes it to an entry_exit_map field which contains two bitmaps
      of 8 bits each.  The advantage of doing this is that it gives us a
      bitmap of which threads need to be signalled when exiting the guest.
      That means that we no longer need to use the trick of setting the
      HDEC to 0 to pull the other threads out of the guest, which led in
      some cases to a spurious HDEC interrupt on the next guest entry.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7d6c40da
    • P
      KVM: PPC: Book3S HV: Use decrementer to wake napping threads · fd6d53b1
      Paul Mackerras 提交于
      This arranges for threads that are napping due to their vcpu having
      ceded or due to not having a vcpu to wake up at the end of the guest's
      timeslice without having to be poked with an IPI.  We do that by
      arranging for the decrementer to contain a value no greater than the
      number of timebase ticks remaining until the end of the timeslice.
      In the case of a thread with no vcpu, this number is in the hypervisor
      decrementer already.  In the case of a ceded vcpu, we use the smaller
      of the HDEC value and the DEC value.
      
      Using the DEC like this when ceded means we need to save and restore
      the guest decrementer value around the nap.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      fd6d53b1
    • P
      KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI · ccc07772
      Paul Mackerras 提交于
      When running a multi-threaded guest and vcpu 0 in a virtual core
      is not running in the guest (i.e. it is busy elsewhere in the host),
      thread 0 of the physical core will switch the MMU to the guest and
      then go to nap mode in the code at kvm_do_nap.  If the guest sends
      an IPI to thread 0 using the msgsndp instruction, that will wake
      up thread 0 and cause all the threads in the guest to exit to the
      host unnecessarily.  To avoid the unnecessary exit, this arranges
      for the PECEDP bit to be cleared in this situation.  When napping
      due to a H_CEDE from the guest, we still set PECEDP so that the
      thread will wake up on an IPI sent using msgsndp.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ccc07772
    • P
      KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken · 5d5b99cd
      Paul Mackerras 提交于
      We can tell when a secondary thread has finished running a guest by
      the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
      is no real need for the nap_count field in the kvmppc_vcore struct.
      This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
      pointers of the secondary threads rather than polling vc->nap_count.
      Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
      this also means that we can tell which secondary threads have got
      stuck and thus print a more informative error message.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5d5b99cd
    • P
      KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu · 25fedfca
      Paul Mackerras 提交于
      Rather than calling cond_resched() in kvmppc_run_core() before doing
      the post-processing for the vcpus that we have just run (that is,
      calling kvmppc_handle_exit_hv(), kvmppc_set_timer(), etc.), we now do
      that post-processing before calling cond_resched(), and that post-
      processing is moved out into its own function, post_guest_process().
      
      The reschedule point is now in kvmppc_run_vcpu() and we define a new
      vcore state, VCORE_PREEMPT, to indicate that that the vcore's runner
      task is runnable but not running.  (Doing the reschedule with the
      vcore in VCORE_INACTIVE state would be bad because there are potentially
      other vcpus waiting for the runner in kvmppc_wait_for_exec() which
      then wouldn't get woken up.)
      
      Also, we make use of the handy cond_resched_lock() function, which
      unlocks and relocks vc->lock for us around the reschedule.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      25fedfca
    • P
      KVM: PPC: Book3S HV: Minor cleanups · 1f09c3ed
      Paul Mackerras 提交于
      * Remove unused kvmppc_vcore::n_busy field.
      * Remove setting of RMOR, since it was only used on PPC970 and the
        PPC970 KVM support has been removed.
      * Don't use r1 or r2 in setting the runlatch since they are
        conventionally reserved for other things; use r0 instead.
      * Streamline the code a little and remove the ext_interrupt_to_host
        label.
      * Add some comments about register usage.
      * hcall_try_real_mode doesn't need to be global, and can't be
        called from C code anyway.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1f09c3ed
    • P
      KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update · d911f0be
      Paul Mackerras 提交于
      Previously, if kvmppc_run_core() was running a VCPU that needed a VPA
      update (i.e. one of its 3 virtual processor areas needed to be pinned
      in memory so the host real mode code can update it on guest entry and
      exit), we would drop the vcore lock and do the update there and then.
      Future changes will make it inconvenient to drop the lock, so instead
      we now remove it from the list of runnable VCPUs and wake up its
      VCPU task.  This will have the effect that the VCPU task will exit
      kvmppc_run_vcpu(), go around the do loop in kvmppc_vcpu_run_hv(), and
      re-enter kvmppc_run_vcpu(), whereupon it will do the necessary call
      to kvmppc_update_vpas() and then rejoin the vcore.
      
      The one complication is that the runner VCPU (whose VCPU task is the
      current task) might be one of the ones that gets removed from the
      runnable list.  In that case we just return from kvmppc_run_core()
      and let the code in kvmppc_run_vcpu() wake up another VCPU task to be
      the runner if necessary.
      
      This all means that the VCORE_STARTING state is no longer used, so we
      remove it.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      d911f0be
    • P
      KVM: PPC: Book3S HV: Accumulate timing information for real-mode code · b6c295df
      Paul Mackerras 提交于
      This reads the timebase at various points in the real-mode guest
      entry/exit code and uses that to accumulate total, minimum and
      maximum time spent in those parts of the code.  Currently these
      times are accumulated per vcpu in 5 parts of the code:
      
      * rm_entry - time taken from the start of kvmppc_hv_entry() until
        just before entering the guest.
      * rm_intr - time from when we take a hypervisor interrupt in the
        guest until we either re-enter the guest or decide to exit to the
        host.  This includes time spent handling hcalls in real mode.
      * rm_exit - time from when we decide to exit the guest until the
        return from kvmppc_hv_entry().
      * guest - time spend in the guest
      * cede - time spent napping in real mode due to an H_CEDE hcall
        while other threads in the same vcore are active.
      
      These times are exposed in debugfs in a directory per vcpu that
      contains a file called "timings".  This file contains one line for
      each of the 5 timings above, with the name followed by a colon and
      4 numbers, which are the count (number of times the code has been
      executed), the total time, the minimum time, and the maximum time,
      all in nanoseconds.
      
      The overhead of the extra code amounts to about 30ns for an hcall that
      is handled in real mode (e.g. H_SET_DABR), which is about 25%.  Since
      production environments may not wish to incur this overhead, the new
      code is conditional on a new config symbol,
      CONFIG_KVM_BOOK3S_HV_EXIT_TIMING.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b6c295df
    • P
      KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT · e23a808b
      Paul Mackerras 提交于
      This creates a debugfs directory for each HV guest (assuming debugfs
      is enabled in the kernel config), and within that directory, a file
      by which the contents of the guest's HPT (hashed page table) can be
      read.  The directory is named vmnnnn, where nnnn is the PID of the
      process that created the guest.  The file is named "htab".  This is
      intended to help in debugging problems in the host's management
      of guest memory.
      
      The contents of the file consist of a series of lines like this:
      
        3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196
      
      The first field is the index of the entry in the HPT, the second and
      third are the HPT entry, so the third entry contains the real page
      number that is mapped by the entry if the entry's valid bit is set.
      The fourth field is the guest's view of the second doubleword of the
      entry, so it contains the guest physical address.  (The format of the
      second through fourth fields are described in the Power ISA and also
      in arch/powerpc/include/asm/mmu-hash64.h.)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e23a808b