1. 22 9月, 2014 3 次提交
  2. 28 7月, 2014 10 次提交
    • A
      KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page · 63fff5c1
      Aneesh Kumar K.V 提交于
      When calculating the lower bits of AVA field, use the shift
      count based on the base page size. Also add the missing segment
      size and remove stale comment.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      63fff5c1
    • S
      Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8 · 9678cdaa
      Stewart Smith 提交于
      The POWER8 processor has a Micro Partition Prefetch Engine, which is
      a fancy way of saying "has way to store and load contents of L2 or
      L2+MRU way of L3 cache". We initiate the storing of the log (list of
      addresses) using the logmpp instruction and start restore by writing
      to a SPR.
      
      The logmpp instruction takes parameters in a single 64bit register:
      - starting address of the table to store log of L2/L2+L3 cache contents
        - 32kb for L2
        - 128kb for L2+L3
        - Aligned relative to maximum size of the table (32kb or 128kb)
      - Log control (no-op, L2 only, L2 and L3, abort logout)
      
      We should abort any ongoing logging before initiating one.
      
      To initiate restore, we write to the MPPR SPR. The format of what to write
      to the SPR is similar to the logmpp instruction parameter:
      - starting address of the table to read from (same alignment requirements)
      - table size (no data, until end of table)
      - prefetch rate (from fastest possible to slower. about every 8, 16, 24 or
        32 cycles)
      
      The idea behind loading and storing the contents of L2/L3 cache is to
      reduce memory latency in a system that is frequently swapping vcores on
      a physical CPU.
      
      The best case scenario for doing this is when some vcores are doing very
      cache heavy workloads. The worst case is when they have about 0 cache hits,
      so we just generate needless memory operations.
      
      This implementation just does L2 store/load. In my benchmarks this proves
      to be useful.
      
      Benchmark 1:
       - 16 core POWER8
       - 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each
       - No split core/SMT
       - two guests running sysbench memory test.
         sysbench --test=memory --num-threads=8 run
       - one guest running apache bench (of default HTML page)
         ab -n 490000 -c 400 http://localhost/
      
      This benchmark aims to measure performance of real world application (apache)
      where other guests are cache hot with their own workloads. The sysbench memory
      benchmark does pointer sized writes to a (small) memory buffer in a loop.
      
      In this benchmark with this patch I can see an improvement both in requests
      per second (~5%) and in mean and median response times (again, about 5%).
      The spread of minimum and maximum response times were largely unchanged.
      
      benchmark 2:
       - Same VM config as benchmark 1
       - all three guests running sysbench memory benchmark
      
      This benchmark aims to see if there is a positive or negative affect to this
      cache heavy benchmark. Although due to the nature of the benchmark (stores) we
      may not see a difference in performance, but rather hopefully an improvement
      in consistency of performance (when vcore switched in, don't have to wait
      many times for cachelines to be pulled in)
      
      The results of this benchmark are improvements in consistency of performance
      rather than performance itself. With this patch, the few outliers in duration
      go away and we get more consistent performance in each guest.
      
      benchmark 3:
       - same 3 guests and CPU configuration as benchmark 1 and 2.
       - two idle guests
       - 1 guest running STREAM benchmark
      
      This scenario also saw performance improvement with this patch. On Copy and
      Scale workloads from STREAM, I got 5-6% improvement with this patch. For
      Add and triad, it was around 10% (or more).
      
      benchmark 4:
       - same 3 guests as previous benchmarks
       - two guests running sysbench --memory, distinctly different cache heavy
         workload
       - one guest running STREAM benchmark.
      
      Similar improvements to benchmark 3.
      
      benchmark 5:
       - 1 guest, 8 VCPUs, Ubuntu 14.04
       - Host configured with split core (SMT8, subcores-per-core=4)
       - STREAM benchmark
      
      In this benchmark, we see a 10-20% performance improvement across the board
      of STREAM benchmark results with this patch.
      
      Based on preliminary investigation and microbenchmarks
      by Prerna Saxena <prerna@linux.vnet.ibm.com>
      Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9678cdaa
    • S
      Split out struct kvmppc_vcore creation to separate function · de9bdd1a
      Stewart Smith 提交于
      No code changes, just split it out to a function so that with the addition
      of micro partition prefetch buffer allocation (in subsequent patch) looks
      neater and doesn't require excessive indentation.
      Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      de9bdd1a
    • A
      KVM: PPC: Book3S: Fix LPCR one_reg interface · a0840240
      Alexey Kardashevskiy 提交于
      Unfortunately, the LPCR got defined as a 32-bit register in the
      one_reg interface.  This is unfortunate because KVM allows userspace
      to control the DPFD (default prefetch depth) field, which is in the
      upper 32 bits.  The result is that DPFD always get set to 0, which
      reduces performance in the guest.
      
      We can't just change KVM_REG_PPC_LPCR to be a 64-bit register ID,
      since that would break existing userspace binaries.  Instead we define
      a new KVM_REG_PPC_LPCR_64 id which is 64-bit.  Userspace can still use
      the old KVM_REG_PPC_LPCR id, but it now only modifies those fields in
      the bottom 32 bits that userspace can modify (ILE, TC and AIL).
      If userspace uses the new KVM_REG_PPC_LPCR_64 id, it can modify DPFD
      as well.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a0840240
    • A
      KVM: PPC: Book3S HV: Access guest VPA in BE · 02407552
      Alexander Graf 提交于
      There are a few shared data structures between the host and the guest. Most
      of them get registered through the VPA interface.
      
      These data structures are defined to always be in big endian byte order, so
      let's make sure we always access them in big endian.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      02407552
    • M
      KVM: PPC: Book3S HV: Add H_SET_MODE hcall handling · 9642382e
      Michael Neuling 提交于
      This adds support for the H_SET_MODE hcall.  This hcall is a
      multiplexer that has several functions, some of which are called
      rarely, and some which are potentially called very frequently.
      Here we add support for the functions that set the debug registers
      CIABR (Completed Instruction Address Breakpoint Register) and
      DAWR/DAWRX (Data Address Watchpoint Register and eXtension),
      since they could be updated by the guest as often as every context
      switch.
      
      This also adds a kvmppc_power8_compatible() function to test to see
      if a guest is compatible with POWER8 or not.  The CIABR and DAWR/X
      only exist on POWER8.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9642382e
    • P
      KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled · ae2113a4
      Paul Mackerras 提交于
      This adds code to check that when the KVM_CAP_PPC_ENABLE_HCALL
      capability is used to enable or disable in-kernel handling of an
      hcall, that the hcall is actually implemented by the kernel.
      If not an EINVAL error is returned.
      
      This also checks the default-enabled list of hcalls and prints a
      warning if any hcall there is not actually implemented.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ae2113a4
    • P
      KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling · 699a0ea0
      Paul Mackerras 提交于
      This provides a way for userspace controls which sPAPR hcalls get
      handled in the kernel.  Each hcall can be individually enabled or
      disabled for in-kernel handling, except for H_RTAS.  The exception
      for H_RTAS is because userspace can already control whether
      individual RTAS functions are handled in-kernel or not via the
      KVM_PPC_RTAS_DEFINE_TOKEN ioctl, and because the numeric value for
      H_RTAS is out of the normal sequence of hcall numbers.
      
      Hcalls are enabled or disabled using the KVM_ENABLE_CAP ioctl for the
      KVM_CAP_PPC_ENABLE_HCALL capability on the file descriptor for the VM.
      The args field of the struct kvm_enable_cap specifies the hcall number
      in args[0] and the enable/disable flag in args[1]; 0 means disable
      in-kernel handling (so that the hcall will always cause an exit to
      userspace) and 1 means enable.  Enabling or disabling in-kernel
      handling of an hcall is effective across the whole VM.
      
      The ability for KVM_ENABLE_CAP to be used on a VM file descriptor
      on PowerPC is new, added by this commit.  The KVM_CAP_ENABLE_CAP_VM
      capability advertises that this ability exists.
      
      When a VM is created, an initial set of hcalls are enabled for
      in-kernel handling.  The set that is enabled is the set that have
      an in-kernel implementation at this point.  Any new hcall
      implementations from this point onwards should not be added to the
      default set without a good reason.
      
      No distinction is made between real-mode and virtual-mode hcall
      implementations; the one setting controls them both.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      699a0ea0
    • A
      KVM: PPC: BOOK3S: PR: Emulate instruction counter · 06da28e7
      Aneesh Kumar K.V 提交于
      Writing to IC is not allowed in the privileged mode.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      06da28e7
    • A
      KVM: PPC: BOOK3S: PR: Emulate virtual timebase register · 8f42ab27
      Aneesh Kumar K.V 提交于
      virtual time base register is a per VM, per cpu register that needs
      to be saved and restored on vm exit and entry. Writing to VTB is not
      allowed in the privileged mode.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [agraf: fix compile error]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8f42ab27
  3. 30 5月, 2014 5 次提交
    • A
      KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest · 1f365bb0
      Aneesh Kumar K.V 提交于
      On recent IBM Power CPUs, while the hashed page table is looked up using
      the page size from the segmentation hardware (i.e. the SLB), it is
      possible to have the HPT entry indicate a larger page size.  Thus for
      example it is possible to put a 16MB page in a 64kB segment, but since
      the hash lookup is done using a 64kB page size, it may be necessary to
      put multiple entries in the HPT for a single 16MB page.  This
      capability is called mixed page-size segment (MPSS).  With MPSS,
      there are two relevant page sizes: the base page size, which is the
      size used in searching the HPT, and the actual page size, which is the
      size indicated in the HPT entry. [ Note that the actual page size is
      always >= base page size ].
      
      We use "ibm,segment-page-sizes" device tree node to advertise
      the MPSS support to PAPR guest. The penc encoding indicates whether
      we support a specific combination of base page size and actual
      page size in the same segment. We also use the penc value in the
      LP encoding of HPTE entry.
      
      This patch exposes MPSS support to KVM guest by advertising the
      feature via "ibm,segment-page-sizes". It also adds the necessary changes
      to decode the base page size and the actual page size correctly from the
      HPTE entry.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1f365bb0
    • A
      KVM: PPC: Book3S PR: Expose EBB registers · 2e23f544
      Alexander Graf 提交于
      POWER8 introduces a new facility called the "Event Based Branch" facility.
      It contains of a few registers that indicate where a guest should branch to
      when a defined event occurs and it's in PR mode.
      
      We don't want to really enable EBB as it will create a big mess with !PR guest
      mode while hardware is in PR and we don't really emulate the PMU anyway.
      
      So instead, let's just leave it at emulation of all its registers.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      2e23f544
    • A
      KVM: PPC: Book3S PR: Expose TAR facility to guest · e14e7a1e
      Alexander Graf 提交于
      POWER8 implements a new register called TAR. This register has to be
      enabled in FSCR and then from KVM's point of view is mere storage.
      
      This patch enables the guest to use TAR.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e14e7a1e
    • A
      KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR · 616dff86
      Alexander Graf 提交于
      POWER8 introduced a new interrupt type called "Facility unavailable interrupt"
      which contains its status message in a new register called FSCR.
      
      Handle these exits and try to emulate instructions for unhandled facilities.
      Follow-on patches enable KVM to expose specific facilities into the guest.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      616dff86
    • A
      KVM: PPC: Make shared struct aka magic page guest endian · 5deb8e7a
      Alexander Graf 提交于
      The shared (magic) page is a data structure that contains often used
      supervisor privileged SPRs accessible via memory to the user to reduce
      the number of exits we have to take to read/write them.
      
      When we actually share this structure with the guest we have to maintain
      it in guest endianness, because some of the patch tricks only work with
      native endian load/store operations.
      
      Since we only share the structure with either host or guest in little
      endian on book3s_64 pr mode, we don't have to worry about booke or book3s hv.
      
      For booke, the shared struct stays big endian. For book3s_64 hv we maintain
      the struct in host native endian, since it never gets shared with the guest.
      
      For book3s_64 pr we introduce a variable that tells us which endianness the
      shared struct is in and route every access to it through helper inline
      functions that evaluate this variable.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5deb8e7a
  4. 28 5月, 2014 2 次提交
  5. 29 3月, 2014 2 次提交
  6. 26 3月, 2014 2 次提交
    • A
      KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n · 7505258c
      Anton Blanchard 提交于
      I noticed KVM is broken when KVM in-kernel XICS emulation
      (CONFIG_KVM_XICS) is disabled.
      
      The problem was introduced in 48eaef05 (KVM: PPC: Book3S HV: use
      xics_wake_cpu only when defined). It used CONFIG_KVM_XICS to wrap
      xics_wake_cpu, where CONFIG_PPC_ICP_NATIVE should have been
      used.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NScott Wood <scottwood@freescale.com>
      7505258c
    • G
      KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write · e59d24e6
      Greg Kurz 提交于
      When the guest does an MMIO write which is handled successfully by an
      ioeventfd, ioeventfd_write() returns 0 (success) and
      kvmppc_handle_store() returns EMULATE_DONE.  Then
      kvmppc_emulate_mmio() converts EMULATE_DONE to RESUME_GUEST_NV and
      this causes an exit from the loop in kvmppc_vcpu_run_hv(), causing an
      exit back to userspace with a bogus exit reason code, typically
      causing userspace (e.g. qemu) to crash with a message about an unknown
      exit code.
      
      This adds handling of RESUME_GUEST_NV in kvmppc_vcpu_run_hv() in order
      to fix that.  For generality, we define a helper to check for either
      of the return-to-guest codes we use, RESUME_GUEST and RESUME_GUEST_NV,
      to make it easy to check for either and provide one place to update if
      any other return-to-guest code gets defined in future.
      
      Since it only affects Book3S HV for now, the helper is added to
      the kvm_book3s.h header file.
      
      We use the helper in two places in kvmppc_run_core() as well for
      future-proofing, though we don't see RESUME_GUEST_NV in either place
      at present.
      
      [paulus@samba.org - combined 4 patches into one, rewrote description]
      Suggested-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      e59d24e6
  7. 27 1月, 2014 10 次提交
    • M
      KVM: PPC: Book3S HV: Add new state for transactional memory · 7b490411
      Michael Neuling 提交于
      Add new state for transactional memory (TM) to kvm_vcpu_arch.  Also add
      asm-offset bits that are going to be required.
      
      This also moves the existing TFHAR, TFIAR and TEXASR SPRs into a
      CONFIG_PPC_TRANSACTIONAL_MEM section.  This requires some code changes to
      ensure we still compile with CONFIG_PPC_TRANSACTIONAL_MEM=N.  Much of the added
      the added #ifdefs are removed in a later patch when the bulk of the TM code is
      added.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix merge conflict]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      7b490411
    • A
      KVM: PPC: Book3S HV: Basic little-endian guest support · d682916a
      Anton Blanchard 提交于
      We create a guest MSR from scratch when delivering exceptions in
      a few places.  Instead of extracting LPCR[ILE] and inserting it
      into MSR_LE each time, we simply create a new variable intr_msr which
      contains the entire MSR to use.  For a little-endian guest, userspace
      needs to set the ILE (interrupt little-endian) bit in the LPCR for
      each vcpu (or at least one vcpu in each virtual core).
      
      [paulus@samba.org - removed H_SET_MODE implementation from original
      version of the patch, and made kvmppc_set_lpcr update vcpu->arch.intr_msr.]
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      d682916a
    • P
      KVM: PPC: Book3S HV: Add support for DABRX register on POWER7 · 8563bf52
      Paul Mackerras 提交于
      The DABRX (DABR extension) register on POWER7 processors provides finer
      control over which accesses cause a data breakpoint interrupt.  It
      contains 3 bits which indicate whether to enable accesses in user,
      kernel and hypervisor modes respectively to cause data breakpoint
      interrupts, plus one bit that enables both real mode and virtual mode
      accesses to cause interrupts.  Currently, KVM sets DABRX to allow
      both kernel and user accesses to cause interrupts while in the guest.
      
      This adds support for the guest to specify other values for DABRX.
      PAPR defines a H_SET_XDABR hcall to allow the guest to set both DABR
      and DABRX with one call.  This adds a real-mode implementation of
      H_SET_XDABR, which shares most of its code with the existing H_SET_DABR
      implementation.  To support this, we add a per-vcpu field to store the
      DABRX value plus code to get and set it via the ONE_REG interface.
      
      For Linux guests to use this new hcall, userspace needs to add
      "hcall-xdabr" to the set of strings in the /chosen/hypertas-functions
      property in the device tree.  If userspace does this and then migrates
      the guest to a host where the kernel doesn't include this patch, then
      userspace will need to implement H_SET_XDABR by writing the specified
      DABR value to the DABR using the ONE_REG interface.  In that case, the
      old kernel will set DABRX to DABRX_USER | DABRX_KERNEL.  That should
      still work correctly, at least for Linux guests, since Linux guests
      cope with getting data breakpoint interrupts in modes that weren't
      requested by just ignoring the interrupt, and Linux guests never set
      DABRX_BTI.
      
      The other thing this does is to make H_SET_DABR and H_SET_XDABR work
      on POWER8, which has the DAWR and DAWRX instead of DABR/X.  Guests that
      know about POWER8 should use H_SET_MODE rather than H_SET_[X]DABR, but
      guests running in POWER7 compatibility mode will still use H_SET_[X]DABR.
      For them, this adds the logic to convert DABR/X values into DAWR/X values
      on POWER8.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8563bf52
    • P
      KVM: PPC: Book3S HV: Prepare for host using hypervisor doorbells · 5d00f66b
      Paul Mackerras 提交于
      POWER8 has support for hypervisor doorbell interrupts.  Though the
      kernel doesn't use them for IPIs on the powernv platform yet, it
      probably will in future, so this makes KVM cope gracefully if a
      hypervisor doorbell interrupt arrives while in a guest.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5d00f66b
    • P
      KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8 · e0622bd9
      Paul Mackerras 提交于
      POWER8 has a bit in the LPCR to enable or disable the PURR and SPURR
      registers to count when in the guest.  Set this bit.
      
      POWER8 has a field in the LPCR called AIL (Alternate Interrupt Location)
      which is used to enable relocation-on interrupts.  Allow userspace to
      set this field.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e0622bd9
    • P
      KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8 · 5557ae0e
      Paul Mackerras 提交于
      This allows us to select architecture 2.05 (POWER6) or 2.06 (POWER7)
      compatibility modes on a POWER8 processor.  (Note that transactional
      memory is disabled for usermode if either or both of the PCR_TM_DIS
      and PCR_ARCH_206 bits are set.)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5557ae0e
    • M
      KVM: PPC: Book3S HV: Add handler for HV facility unavailable · bd3048b8
      Michael Ellerman 提交于
      At present this should never happen, since the host kernel sets
      HFSCR to allow access to all facilities.  It's better to be prepared
      to handle it cleanly if it does ever happen, though.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      bd3048b8
    • M
      KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs · b005255e
      Michael Neuling 提交于
      This adds fields to the struct kvm_vcpu_arch to store the new
      guest-accessible SPRs on POWER8, adds code to the get/set_one_reg
      functions to allow userspace to access this state, and adds code to
      the guest entry and exit to context-switch these SPRs between host
      and guest.
      
      Note that DPDES (Directed Privileged Doorbell Exception State) is
      shared between threads on a core; hence we store it in struct
      kvmppc_vcore and have the master thread save and restore it.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b005255e
    • P
      KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers · e0b7ec05
      Paul Mackerras 提交于
      On a threaded processor such as POWER7, we group VCPUs into virtual
      cores and arrange that the VCPUs in a virtual core run on the same
      physical core.  Currently we don't enforce any correspondence between
      virtual thread numbers within a virtual core and physical thread
      numbers.  Physical threads are allocated starting at 0 on a first-come
      first-served basis to runnable virtual threads (VCPUs).
      
      POWER8 implements a new "msgsndp" instruction which guest kernels can
      use to interrupt other threads in the same core or sub-core.  Since
      the instruction takes the destination physical thread ID as a parameter,
      it becomes necessary to align the physical thread IDs with the virtual
      thread IDs, that is, to make sure virtual thread N within a virtual
      core always runs on physical thread N.
      
      This means that it's possible that thread 0, which is where we call
      __kvmppc_vcore_entry, may end up running some other vcpu than the
      one whose task called kvmppc_run_core(), or it may end up running
      no vcpu at all, if for example thread 0 of the virtual core is
      currently executing in userspace.  However, we do need thread 0
      to be responsible for switching the MMU -- a previous version of
      this patch that had other threads switching the MMU was found to
      be responsible for occasional memory corruption and machine check
      interrupts in the guest on POWER7 machines.
      
      To accommodate this, we no longer pass the vcpu pointer to
      __kvmppc_vcore_entry, but instead let the assembly code load it from
      the PACA.  Since the assembly code will need to know the kvm pointer
      and the thread ID for threads which don't have a vcpu, we move the
      thread ID into the PACA and we add a kvm pointer to the virtual core
      structure.
      
      In the case where thread 0 has no vcpu to run, it still calls into
      kvmppc_hv_entry in order to do the MMU switch, and then naps until
      either its vcpu is ready to run in the guest, or some other thread
      needs to exit the guest.  In the latter case, thread 0 jumps to the
      code that switches the MMU back to the host.  This control flow means
      that now we switch the MMU before loading any guest vcpu state.
      Similarly, on guest exit we now save all the guest vcpu state before
      switching the MMU back to the host.  This has required substantial
      code movement, making the diff rather large.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e0b7ec05
    • A
      KVM: PPC: Book3S HV: use xics_wake_cpu only when defined · 48eaef05
      Andreas Schwab 提交于
      Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
      CC: stable@vger.kernel.org
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      48eaef05
  8. 09 1月, 2014 2 次提交
  9. 13 12月, 2013 1 次提交
  10. 21 11月, 2013 1 次提交
  11. 19 11月, 2013 2 次提交
    • P
      KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call · c9438092
      Paul Mackerras 提交于
      Running a kernel with CONFIG_PROVE_RCU=y yields the following diagnostic:
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      3.12.0-rc5-kvm+ #9 Not tainted
      -------------------------------
      
      include/linux/kvm_host.h:473 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 0
      1 lock held by qemu-system-ppc/4831:
      
      stack backtrace:
      CPU: 28 PID: 4831 Comm: qemu-system-ppc Not tainted 3.12.0-rc5-kvm+ #9
      Call Trace:
      [c000000be462b2a0] [c00000000001644c] .show_stack+0x7c/0x1f0 (unreliable)
      [c000000be462b370] [c000000000ad57c0] .dump_stack+0x88/0xb4
      [c000000be462b3f0] [c0000000001315e8] .lockdep_rcu_suspicious+0x138/0x180
      [c000000be462b480] [c00000000007862c] .gfn_to_memslot+0x13c/0x170
      [c000000be462b510] [c00000000007d384] .gfn_to_hva_prot+0x24/0x90
      [c000000be462b5a0] [c00000000007d420] .kvm_read_guest_page+0x30/0xd0
      [c000000be462b630] [c00000000007d528] .kvm_read_guest+0x68/0x110
      [c000000be462b6e0] [c000000000084594] .kvmppc_rtas_hcall+0x34/0x180
      [c000000be462b7d0] [c000000000097934] .kvmppc_pseries_do_hcall+0x74/0x830
      [c000000be462b880] [c0000000000990e8] .kvmppc_vcpu_run_hv+0xff8/0x15a0
      [c000000be462b9e0] [c0000000000839cc] .kvmppc_vcpu_run+0x2c/0x40
      [c000000be462ba50] [c0000000000810b4] .kvm_arch_vcpu_ioctl_run+0x54/0x1b0
      [c000000be462bae0] [c00000000007b508] .kvm_vcpu_ioctl+0x478/0x730
      [c000000be462bca0] [c00000000025532c] .do_vfs_ioctl+0x4dc/0x7a0
      [c000000be462bd80] [c0000000002556b4] .SyS_ioctl+0xc4/0xe0
      [c000000be462be30] [c000000000009ee4] syscall_exit+0x0/0x98
      
      To fix this, we take the SRCU read lock around the kvmppc_rtas_hcall()
      call.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c9438092
    • P
      KVM: PPC: Book3S HV: Make tbacct_lock irq-safe · bf3d32e1
      Paul Mackerras 提交于
      Lockdep reported that there is a potential for deadlock because
      vcpu->arch.tbacct_lock is not irq-safe, and is sometimes taken inside
      the rq_lock (run-queue lock) in the scheduler, which is taken within
      interrupts.  The lockdep splat looks like:
      
      ======================================================
      [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
      3.12.0-rc5-kvm+ #8 Not tainted
      ------------------------------------------------------
      qemu-system-ppc/4803 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
      (&(&vcpu->arch.tbacct_lock)->rlock){+.+...}, at: [<c0000000000947ac>] .kvmppc_core_vcpu_put_hv+0x2c/0xa0
      
      and this task is already holding:
      (&rq->lock){-.-.-.}, at: [<c000000000ac16c0>] .__schedule+0x180/0xaa0
      which would create a new lock dependency:
      (&rq->lock){-.-.-.} -> (&(&vcpu->arch.tbacct_lock)->rlock){+.+...}
      
      but this new dependency connects a HARDIRQ-irq-safe lock:
      (&rq->lock){-.-.-.}
      ... which became HARDIRQ-irq-safe at:
       [<c00000000013797c>] .lock_acquire+0xbc/0x190
       [<c000000000ac3c74>] ._raw_spin_lock+0x34/0x60
       [<c0000000000f8564>] .scheduler_tick+0x54/0x180
       [<c0000000000c2610>] .update_process_times+0x70/0xa0
       [<c00000000012cdfc>] .tick_periodic+0x3c/0xe0
       [<c00000000012cec8>] .tick_handle_periodic+0x28/0xb0
       [<c00000000001ef40>] .timer_interrupt+0x120/0x2e0
       [<c000000000002868>] decrementer_common+0x168/0x180
       [<c0000000001c7ca4>] .get_page_from_freelist+0x924/0xc10
       [<c0000000001c8e00>] .__alloc_pages_nodemask+0x200/0xba0
       [<c0000000001c9eb8>] .alloc_pages_exact_nid+0x68/0x110
       [<c000000000f4c3ec>] .page_cgroup_init+0x1e0/0x270
       [<c000000000f24480>] .start_kernel+0x3e0/0x4e4
       [<c000000000009d30>] .start_here_common+0x20/0x70
      
      to a HARDIRQ-irq-unsafe lock:
      (&(&vcpu->arch.tbacct_lock)->rlock){+.+...}
      ... which became HARDIRQ-irq-unsafe at:
      ...  [<c00000000013797c>] .lock_acquire+0xbc/0x190
       [<c000000000ac3c74>] ._raw_spin_lock+0x34/0x60
       [<c0000000000946ac>] .kvmppc_core_vcpu_load_hv+0x2c/0x100
       [<c00000000008394c>] .kvmppc_core_vcpu_load+0x2c/0x40
       [<c000000000081000>] .kvm_arch_vcpu_load+0x10/0x30
       [<c00000000007afd4>] .vcpu_load+0x64/0xd0
       [<c00000000007b0f8>] .kvm_vcpu_ioctl+0x68/0x730
       [<c00000000025530c>] .do_vfs_ioctl+0x4dc/0x7a0
       [<c000000000255694>] .SyS_ioctl+0xc4/0xe0
       [<c000000000009ee4>] syscall_exit+0x0/0x98
      
      Some users have reported this deadlock occurring in practice, though
      the reports have been primarily on 3.10.x-based kernels.
      
      This fixes the problem by making tbacct_lock be irq-safe.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      bf3d32e1