1. 28 7月, 2014 40 次提交
    • A
      KVM: Rename and add argument to check_extension · 784aa3d7
      Alexander Graf 提交于
      In preparation to make the check_extension function available to VM scope
      we add a struct kvm * argument to the function header and rename the function
      accordingly. It will still be called from the /dev/kvm fd, but with a NULL
      argument for struct kvm *.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      784aa3d7
    • S
      Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8 · 9678cdaa
      Stewart Smith 提交于
      The POWER8 processor has a Micro Partition Prefetch Engine, which is
      a fancy way of saying "has way to store and load contents of L2 or
      L2+MRU way of L3 cache". We initiate the storing of the log (list of
      addresses) using the logmpp instruction and start restore by writing
      to a SPR.
      
      The logmpp instruction takes parameters in a single 64bit register:
      - starting address of the table to store log of L2/L2+L3 cache contents
        - 32kb for L2
        - 128kb for L2+L3
        - Aligned relative to maximum size of the table (32kb or 128kb)
      - Log control (no-op, L2 only, L2 and L3, abort logout)
      
      We should abort any ongoing logging before initiating one.
      
      To initiate restore, we write to the MPPR SPR. The format of what to write
      to the SPR is similar to the logmpp instruction parameter:
      - starting address of the table to read from (same alignment requirements)
      - table size (no data, until end of table)
      - prefetch rate (from fastest possible to slower. about every 8, 16, 24 or
        32 cycles)
      
      The idea behind loading and storing the contents of L2/L3 cache is to
      reduce memory latency in a system that is frequently swapping vcores on
      a physical CPU.
      
      The best case scenario for doing this is when some vcores are doing very
      cache heavy workloads. The worst case is when they have about 0 cache hits,
      so we just generate needless memory operations.
      
      This implementation just does L2 store/load. In my benchmarks this proves
      to be useful.
      
      Benchmark 1:
       - 16 core POWER8
       - 3x Ubuntu 14.04LTS guests (LE) with 8 VCPUs each
       - No split core/SMT
       - two guests running sysbench memory test.
         sysbench --test=memory --num-threads=8 run
       - one guest running apache bench (of default HTML page)
         ab -n 490000 -c 400 http://localhost/
      
      This benchmark aims to measure performance of real world application (apache)
      where other guests are cache hot with their own workloads. The sysbench memory
      benchmark does pointer sized writes to a (small) memory buffer in a loop.
      
      In this benchmark with this patch I can see an improvement both in requests
      per second (~5%) and in mean and median response times (again, about 5%).
      The spread of minimum and maximum response times were largely unchanged.
      
      benchmark 2:
       - Same VM config as benchmark 1
       - all three guests running sysbench memory benchmark
      
      This benchmark aims to see if there is a positive or negative affect to this
      cache heavy benchmark. Although due to the nature of the benchmark (stores) we
      may not see a difference in performance, but rather hopefully an improvement
      in consistency of performance (when vcore switched in, don't have to wait
      many times for cachelines to be pulled in)
      
      The results of this benchmark are improvements in consistency of performance
      rather than performance itself. With this patch, the few outliers in duration
      go away and we get more consistent performance in each guest.
      
      benchmark 3:
       - same 3 guests and CPU configuration as benchmark 1 and 2.
       - two idle guests
       - 1 guest running STREAM benchmark
      
      This scenario also saw performance improvement with this patch. On Copy and
      Scale workloads from STREAM, I got 5-6% improvement with this patch. For
      Add and triad, it was around 10% (or more).
      
      benchmark 4:
       - same 3 guests as previous benchmarks
       - two guests running sysbench --memory, distinctly different cache heavy
         workload
       - one guest running STREAM benchmark.
      
      Similar improvements to benchmark 3.
      
      benchmark 5:
       - 1 guest, 8 VCPUs, Ubuntu 14.04
       - Host configured with split core (SMT8, subcores-per-core=4)
       - STREAM benchmark
      
      In this benchmark, we see a 10-20% performance improvement across the board
      of STREAM benchmark results with this patch.
      
      Based on preliminary investigation and microbenchmarks
      by Prerna Saxena <prerna@linux.vnet.ibm.com>
      Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9678cdaa
    • S
      Split out struct kvmppc_vcore creation to separate function · de9bdd1a
      Stewart Smith 提交于
      No code changes, just split it out to a function so that with the addition
      of micro partition prefetch buffer allocation (in subsequent patch) looks
      neater and doesn't require excessive indentation.
      Signed-off-by: NStewart Smith <stewart@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      de9bdd1a
    • P
      KVM: PPC: Book3S: Make kvmppc_ld return a more accurate error indication · 1b2e33b0
      Paul Mackerras 提交于
      At present, kvmppc_ld calls kvmppc_xlate, and if kvmppc_xlate returns
      any error indication, it returns -ENOENT, which is taken to mean an
      HPTE not found error.  However, the error could have been a segment
      found (no SLB entry) or a permission error.  Similarly,
      kvmppc_pte_to_hva currently does permission checking, but any error
      from it is taken by kvmppc_ld to mean that the access is an emulated
      MMIO access.  Also, kvmppc_ld does no execute permission checking.
      
      This fixes these problems by (a) returning any error from kvmppc_xlate
      directly, (b) moving the permission check from kvmppc_pte_to_hva
      into kvmppc_ld, and (c) adding an execute permission check to kvmppc_ld.
      
      This is similar to what was done for kvmppc_st() by commit 82ff911317c3
      ("KVM: PPC: Deflect page write faults properly in kvmppc_st").
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1b2e33b0
    • P
      KVM: PPC: Book3S PR: Take SRCU read lock around RTAS kvm_read_guest() call · ef1af2e2
      Paul Mackerras 提交于
      This does for PR KVM what c9438092 ("KVM: PPC: Book3S HV: Take SRCU
      read lock around kvm_read_guest() call") did for HV KVM, that is,
      eliminate a "suspicious rcu_dereference_check() usage!" warning by
      taking the SRCU lock around the call to kvmppc_rtas_hcall().
      
      It also fixes a return of RESUME_HOST to return EMULATE_FAIL instead,
      since kvmppc_h_pr() is supposed to return EMULATE_* values.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ef1af2e2
    • A
      KVM: PPC: Book3S: Fix LPCR one_reg interface · a0840240
      Alexey Kardashevskiy 提交于
      Unfortunately, the LPCR got defined as a 32-bit register in the
      one_reg interface.  This is unfortunate because KVM allows userspace
      to control the DPFD (default prefetch depth) field, which is in the
      upper 32 bits.  The result is that DPFD always get set to 0, which
      reduces performance in the guest.
      
      We can't just change KVM_REG_PPC_LPCR to be a 64-bit register ID,
      since that would break existing userspace binaries.  Instead we define
      a new KVM_REG_PPC_LPCR_64 id which is 64-bit.  Userspace can still use
      the old KVM_REG_PPC_LPCR id, but it now only modifies those fields in
      the bottom 32 bits that userspace can modify (ILE, TC and AIL).
      If userspace uses the new KVM_REG_PPC_LPCR_64 id, it can modify DPFD
      as well.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a0840240
    • A
      KVM: PPC: Remove 440 support · b2677b8d
      Alexander Graf 提交于
      The 440 target hasn't been properly functioning for a few releases and
      before I was the only one who fixes a very serious bug that indicates to
      me that nobody used it before either.
      
      Furthermore KVM on 440 is slow to the extent of unusable.
      
      We don't have to carry along completely unused code. Remove 440 and give
      us one less thing to worry about.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b2677b8d
    • B
      KVM: PPC: Remove comment saying SPRG1 is used for vcpu pointer · 8c95ead6
      Bharat Bhushan 提交于
      Scott Wood pointed out that We are no longer using SPRG1 for vcpu pointer,
      but using SPRN_SPRG_THREAD <=> SPRG3 (thread->vcpu). So this comment
      is not valid now.
      
      Note: SPRN_SPRG3R is not supported (do not see any need as of now),
      and if we want to support this in future then we have to shift to using
      SPRG1 for VCPU pointer.
      Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8c95ead6
    • B
      KVM: PPC: Booke-hv: Add one reg interface for SPRG9 · 28d2f421
      Bharat Bhushan 提交于
      We now support SPRG9 for guest, so also add a one reg interface for same
      Note: Changes are in bookehv code only as we do not have SPRG9 on booke-pr.
      Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      28d2f421
    • B
      kvm: ppc: bookehv: Save restore SPRN_SPRG9 on guest entry exit · 99e99d19
      Bharat Bhushan 提交于
      SPRN_SPRG is used by debug interrupt handler, so this is required for
      debug support.
      Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      99e99d19
    • M
      KVM: PPC: Bookehv: Get vcpu's last instruction for emulation · f5250471
      Mihai Caraman 提交于
      On book3e, KVM uses load external pid (lwepx) dedicated instruction to read
      guest last instruction on the exit path. lwepx exceptions (DTLB_MISS, DSI
      and LRAT), generated by loading a guest address, needs to be handled by KVM.
      These exceptions are generated in a substituted guest translation context
      (EPLC[EGS] = 1) from host context (MSR[GS] = 0).
      
      Currently, KVM hooks only interrupts generated from guest context (MSR[GS] = 1),
      doing minimal checks on the fast path to avoid host performance degradation.
      lwepx exceptions originate from host state (MSR[GS] = 0) which implies
      additional checks in DO_KVM macro (beside the current MSR[GS] = 1) by looking
      at the Exception Syndrome Register (ESR[EPID]) and the External PID Load Context
      Register (EPLC[EGS]). Doing this on each Data TLB miss exception is obvious
      too intrusive for the host.
      
      Read guest last instruction from kvmppc_load_last_inst() by searching for the
      physical address and kmap it. This address the TODO for TLB eviction and
      execute-but-not-read entries, and allow us to get rid of lwepx until we are
      able to handle failures.
      
      A simple stress benchmark shows a 1% sys performance degradation compared with
      previous approach (lwepx without failure handling):
      
      time for i in `seq 1 10000`; do /bin/echo > /dev/null; done
      
      real    0m 8.85s
      user    0m 4.34s
      sys     0m 4.48s
      
      vs
      
      real    0m 8.84s
      user    0m 4.36s
      sys     0m 4.44s
      
      A solution to use lwepx and to handle its exceptions in KVM would be to temporary
      highjack the interrupt vector from host. This imposes additional synchronizations
      for cores like FSL e6500 that shares host IVOR registers between hardware threads.
      This optimized solution can be later developed on top of this patch.
      Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      f5250471
    • M
      KVM: PPC: Allow kvmppc_get_last_inst() to fail · 51f04726
      Mihai Caraman 提交于
      On book3e, guest last instruction is read on the exit path using load
      external pid (lwepx) dedicated instruction. This load operation may fail
      due to TLB eviction and execute-but-not-read entries.
      
      This patch lay down the path for an alternative solution to read the guest
      last instruction, by allowing kvmppc_get_lat_inst() function to fail.
      Architecture specific implmentations of kvmppc_load_last_inst() may read
      last guest instruction and instruct the emulation layer to re-execute the
      guest in case of failure.
      
      Make kvmppc_get_last_inst() definition common between architectures.
      Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      51f04726
    • M
      KVM: PPC: Book3s: Remove kvmppc_read_inst() function · 9a26af64
      Mihai Caraman 提交于
      In the context of replacing kvmppc_ld() function calls with a version of
      kvmppc_get_last_inst() which allow to fail, Alex Graf suggested this:
      
      "If we get EMULATE_AGAIN, we just have to make sure we go back into the guest.
      No need to inject an ISI into  the guest - it'll do that all by itself.
      With an error returning kvmppc_get_last_inst we can just use completely
      get rid of kvmppc_read_inst() and only use kvmppc_get_last_inst() instead."
      
      As a intermediate step get rid of kvmppc_read_inst() and only use kvmppc_ld()
      instead.
      Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9a26af64
    • M
      KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1 · 9c0d4e0d
      Mihai Caraman 提交于
      Add mising defines MAS0_GET_TLBSEL() and MAS1_GET_TSIZE() for Book3E.
      Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9c0d4e0d
    • M
      KVM: PPC: e500mc: Revert "add load inst fixup" · b5741bb3
      Mihai Caraman 提交于
      The commit 1d628af7 "add load inst fixup" made an attempt to handle
      failures generated by reading the guest current instruction. The fixup
      code that was added works by chance hiding the real issue.
      
      Load external pid (lwepx) instruction, used by KVM to read guest
      instructions, is executed in a subsituted guest translation context
      (EPLC[EGS] = 1). In consequence lwepx's TLB error and data storage
      interrupts need to be handled by KVM, even though these interrupts
      are generated from host context (MSR[GS] = 0) where lwepx is executed.
      
      Currently, KVM hooks only interrupts generated from guest context
      (MSR[GS] = 1), doing minimal checks on the fast path to avoid host
      performance degradation. As a result, the host kernel handles lwepx
      faults searching the faulting guest data address (loaded in DEAR) in
      its own Logical Partition ID (LPID) 0 context. In case a host translation
      is found the execution returns to the lwepx instruction instead of the
      fixup, the host ending up in an infinite loop.
      
      Revert the commit "add load inst fixup". lwepx issue will be addressed
      in a subsequent patch without needing fixup code.
      Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b5741bb3
    • B
      kvm: ppc: Add SPRN_EPR get helper function · 34f754b9
      Bharat Bhushan 提交于
      kvmppc_set_epr() is already defined in asm/kvm_ppc.h, So
      rename and move get_epr helper function to same file.
      Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
      [agraf: remove duplicate return]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      34f754b9
    • B
      kvm: ppc: booke: Use the shared struct helpers for SPRN_SPRG0-7 · c1b8a01b
      Bharat Bhushan 提交于
      Use kvmppc_set_sprg[0-7]() and kvmppc_get_sprg[0-7]() helper
      functions
      Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c1b8a01b
    • B
      kvm: ppc: booke: Add shared struct helpers of SPRN_ESR · dc168549
      Bharat Bhushan 提交于
      Add and use kvmppc_set_esr() and kvmppc_get_esr() helper functions
      Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      dc168549
    • B
      kvm: ppc: booke: Use the shared struct helpers of SPRN_DEAR · a5414d4b
      Bharat Bhushan 提交于
      Uses kvmppc_set_dar() and kvmppc_get_dar() helper functions
      Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a5414d4b
    • B
      kvm: ppc: booke: Use the shared struct helpers of SRR0 and SRR1 · 31579eea
      Bharat Bhushan 提交于
      Use kvmppc_set_srr0/srr1() and kvmppc_get_srr0/srr1() helper functions
      Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      31579eea
    • B
      kvm: ppc: bookehv: Added wrapper macros for shadow registers · 1dc0c5b8
      Bharat Bhushan 提交于
      There are shadow registers like, GSPRG[0-3], GSRR0, GSRR1 etc on
      BOOKE-HV and these shadow registers are guest accessible.
      So these shadow registers needs to be updated on BOOKE-HV.
      This patch adds new macro for get/set helper of shadow register .
      Signed-off-by: NBharat Bhushan <Bharat.Bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1dc0c5b8
    • A
      KVM: PPC: Book3S: Make magic page properly 4k mappable · 89b68c96
      Alexander Graf 提交于
      The magic page is defined as a 4k page of per-vCPU data that is shared
      between the guest and the host to accelerate accesses to privileged
      registers.
      
      However, when the host is using 64k page size granularity we weren't quite
      as strict about that rule anymore. Instead, we partially treated all of the
      upper 64k as magic page and mapped only the uppermost 4k with the actual
      magic contents.
      
      This works well enough for Linux which doesn't use any memory in kernel
      space in the upper 64k, but Mac OS X got upset. So this patch makes magic
      page actually stay in a 4k range even on 64k page size hosts.
      
      This patch fixes magic page usage with Mac OS X (using MOL) on 64k PAGE_SIZE
      hosts for me.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      89b68c96
    • A
      KVM: PPC: Book3S: Add hack for split real mode · c01e3f66
      Alexander Graf 提交于
      Today we handle split real mode by mapping both instruction and data faults
      into a special virtual address space that only exists during the split mode
      phase.
      
      This is good enough to catch 32bit Linux guests that use split real mode for
      copy_from/to_user. In this case we're always prefixed with 0xc0000000 for our
      instruction pointer and can map the user space process freely below there.
      
      However, that approach fails when we're running KVM inside of KVM. Here the 1st
      level last_inst reader may well be in the same virtual page as a 2nd level
      interrupt handler.
      
      It also fails when running Mac OS X guests. Here we have a 4G/4G split, so a
      kernel copy_from/to_user implementation can easily overlap with user space
      addresses.
      
      The architecturally correct way to fix this would be to implement an instruction
      interpreter in KVM that kicks in whenever we go into split real mode. This
      interpreter however would not receive a great amount of testing and be a lot of
      bloat for a reasonably isolated corner case.
      
      So I went back to the drawing board and tried to come up with a way to make
      split real mode work with a single flat address space. And then I realized that
      we could get away with the same trick that makes it work for Linux:
      
      Whenever we see an instruction address during split real mode that may collide,
      we just move it higher up the virtual address space to a place that hopefully
      does not collide (keep your fingers crossed!).
      
      That approach does work surprisingly well. I am able to successfully run
      Mac OS X guests with KVM and QEMU (no split real mode hacks like MOL) when I
      apply a tiny timing probe hack to QEMU. I'd say this is a win over even more
      broken split real mode :).
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c01e3f66
    • A
      KVM: PPC: Book3S: Stop PTE lookup on write errors · 2e27ecc9
      Alexander Graf 提交于
      When a page lookup failed because we're not allowed to write to the page, we
      should not overwrite that value with another lookup on the second PTEG which
      will return "page not found". Instead, we should just tell the caller that we
      had a permission problem.
      
      This fixes Mac OS X guests looping endlessly in page lookup code for me.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      2e27ecc9
    • A
      KVM: PPC: Deflect page write faults properly in kvmppc_st · 17824b5a
      Alexander Graf 提交于
      When we have a page that we're not allowed to write to, xlate() will already
      tell us -EPERM on lookup of that page. With the code as is we change it into
      a "page missing" error which a guest may get confused about. Instead, just
      tell the caller about the -EPERM directly.
      
      This fixes Mac OS X guests when run with DCBZ32 emulation.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      17824b5a
    • A
      KVM: PPC: Book3S: Move vcore definition to end of kvm_arch struct · 1287cb3f
      Alexander Graf 提交于
      When building KVM with a lot of vcores (NR_CPUS is big), we can potentially
      get out of the ld immediate range for dereferences inside that struct.
      
      Move the array to the end of our kvm_arch struct. This fixes compilation
      issues with NR_CPUS=2048 for me.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1287cb3f
    • M
      KVM: PPC: e500: Emulate power management control SPR · debf27d6
      Mihai Caraman 提交于
      For FSL e6500 core the kernel uses power management SPR register (PWRMGTCR0)
      to enable idle power down for cores and devices by setting up the idle count
      period at boot time. With the host already controlling the power management
      configuration the guest could simply benefit from it, so emulate guest request
      as a general store.
      Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      debf27d6
    • A
      KVM: PPC: Book3S HV: Enable for little endian hosts · 6947f948
      Alexander Graf 提交于
      Now that we've fixed all the issues that HV KVM code had on little endian
      hosts, we can enable it in the kernel configuration for users to play with.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      6947f948
    • A
      KVM: PPC: Book3S HV: Fix ABIv2 on LE · 9bf163f8
      Alexander Graf 提交于
      For code that doesn't live in modules we can just branch to the real function
      names, giving us compatibility with ABIv1 and ABIv2.
      
      Do this for the compiled-in code of HV KVM.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9bf163f8
    • A
      KVM: PPC: Book3S HV: Access XICS in BE · 76d072fb
      Alexander Graf 提交于
      On the exit path from the guest we check what type of interrupt we received
      if we received one. This means we're doing hardware access to the XICS interrupt
      controller.
      
      However, when running on a little endian system, this access is byte reversed.
      
      So let's make sure to swizzle the bytes back again and virtually make XICS
      accesses big endian.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      76d072fb
    • A
      KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE · 0865a583
      Alexander Graf 提交于
      Some data structures are always stored in big endian. Among those are the LPPACA
      fields as well as the shadow slb. These structures might be shared with a
      hypervisor.
      
      So whenever we access those fields, make sure we do so in big endian byte order.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      0865a583
    • A
      KVM: PPC: Book3S HV: Access guest VPA in BE · 02407552
      Alexander Graf 提交于
      There are a few shared data structures between the host and the guest. Most
      of them get registered through the VPA interface.
      
      These data structures are defined to always be in big endian byte order, so
      let's make sure we always access them in big endian.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      02407552
    • A
      KVM: PPC: Book3S HV: Make HTAB code LE host aware · 6f22bd32
      Alexander Graf 提交于
      When running on an LE host all data structures are kept in little endian
      byte order. However, the HTAB still needs to be maintained in big endian.
      
      So every time we access any HTAB we need to make sure we do so in the right
      byte order. Fix up all accesses to manually byte swap.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      6f22bd32
    • A
      PPC: Add asm helpers for BE 32bit load/store · 8f6822c4
      Alexander Graf 提交于
      From assembly code we might not only have to explicitly BE access 64bit values,
      but sometimes also 32bit ones. Add helpers that allow for easy use of lwzx/stwx
      in their respective byte-reverse or native form.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      8f6822c4
    • M
      KVM: PPC: e500: Fix default tlb for victim hint · d57cef91
      Mihai Caraman 提交于
      Tlb search operation used for victim hint relies on the default tlb set by the
      host. When hardware tablewalk support is enabled in the host, the default tlb is
      TLB1 which leads KVM to evict the bolted entry. Set and restore the default tlb
      when searching for victim hint.
      Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
      Reviewed-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      d57cef91
    • M
      KVM: PPC: Book3S HV: Add H_SET_MODE hcall handling · 9642382e
      Michael Neuling 提交于
      This adds support for the H_SET_MODE hcall.  This hcall is a
      multiplexer that has several functions, some of which are called
      rarely, and some which are potentially called very frequently.
      Here we add support for the functions that set the debug registers
      CIABR (Completed Instruction Address Breakpoint Register) and
      DAWR/DAWRX (Data Address Watchpoint Register and eXtension),
      since they could be updated by the guest as often as every context
      switch.
      
      This also adds a kvmppc_power8_compatible() function to test to see
      if a guest is compatible with POWER8 or not.  The CIABR and DAWR/X
      only exist on POWER8.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9642382e
    • P
      KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled · ae2113a4
      Paul Mackerras 提交于
      This adds code to check that when the KVM_CAP_PPC_ENABLE_HCALL
      capability is used to enable or disable in-kernel handling of an
      hcall, that the hcall is actually implemented by the kernel.
      If not an EINVAL error is returned.
      
      This also checks the default-enabled list of hcalls and prints a
      warning if any hcall there is not actually implemented.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ae2113a4
    • P
      KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling · 699a0ea0
      Paul Mackerras 提交于
      This provides a way for userspace controls which sPAPR hcalls get
      handled in the kernel.  Each hcall can be individually enabled or
      disabled for in-kernel handling, except for H_RTAS.  The exception
      for H_RTAS is because userspace can already control whether
      individual RTAS functions are handled in-kernel or not via the
      KVM_PPC_RTAS_DEFINE_TOKEN ioctl, and because the numeric value for
      H_RTAS is out of the normal sequence of hcall numbers.
      
      Hcalls are enabled or disabled using the KVM_ENABLE_CAP ioctl for the
      KVM_CAP_PPC_ENABLE_HCALL capability on the file descriptor for the VM.
      The args field of the struct kvm_enable_cap specifies the hcall number
      in args[0] and the enable/disable flag in args[1]; 0 means disable
      in-kernel handling (so that the hcall will always cause an exit to
      userspace) and 1 means enable.  Enabling or disabling in-kernel
      handling of an hcall is effective across the whole VM.
      
      The ability for KVM_ENABLE_CAP to be used on a VM file descriptor
      on PowerPC is new, added by this commit.  The KVM_CAP_ENABLE_CAP_VM
      capability advertises that this ability exists.
      
      When a VM is created, an initial set of hcalls are enabled for
      in-kernel handling.  The set that is enabled is the set that have
      an in-kernel implementation at this point.  Any new hcall
      implementations from this point onwards should not be added to the
      default set without a good reason.
      
      No distinction is made between real-mode and virtual-mode hcall
      implementations; the one setting controls them both.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      699a0ea0
    • M
      KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule · 1f0eeb7e
      Mihai Caraman 提交于
      On vcpu schedule, the condition checked for tlb pollution is too loose.
      The tlb entries of a vcpu become polluted (vs stale) only when a different
      vcpu within the same logical partition runs in-between. Optimize the tlb
      invalidation condition keeping last_vcpu per logical partition id.
      
      With the new invalidation condition, a guest shows 4% performance improvement
      on P5020DS while running a memory stress application with the cpu oversubscribed,
      the other guest running a cpu intensive workload.
      
      Guest - old invalidation condition
        real 3.89
        user 3.87
        sys 0.01
      
      Guest - enhanced invalidation condition
        real 3.75
        user 3.73
        sys 0.01
      
      Host
        real 3.70
        user 1.85
        sys 0.00
      
      The memory stress application accesses 4KB pages backed by 75% of available
      TLB0 entries:
      
      char foo[ENTRIES][4096] __attribute__ ((aligned (4096)));
      
      int main()
      {
      	char bar;
      	int i, j;
      
      	for (i = 0; i < ITERATIONS; i++)
              	for (j = 0; j < ENTRIES; j++)
                  		bar = foo[j][0];
      
      	return 0;
      }
      Signed-off-by: NMihai Caraman <mihai.caraman@freescale.com>
      Reviewed-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      1f0eeb7e
    • A
      KVM: PPC: Book3S PR: Fix sparse endian checks · f396df35
      Alexander Graf 提交于
      While sending sparse with endian checks over the code base, it triggered at
      some places that were missing casts or had wrong types. Fix them up.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      f396df35