1. 18 5月, 2018 13 次提交
    • P
      KVM: PPC: Book3S PR: Enable use on POWER9 inside HPT-mode guests · ec531d02
      Paul Mackerras 提交于
      This relaxes the restriction on using PR KVM on POWER9.  The existing
      code does work inside a guest partition running in HPT mode, because
      hypercalls such as H_ENTER use the old HPTE format, not the new
      format used by POWER9, and so no change to PR KVM's HPT manipulation
      code is required.  PR KVM will still refuse to run if the kernel is
      using radix translation or if it is running bare-metal.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ec531d02
    • N
      KVM: PPC: Book3S HV: Send kvmppc_bad_interrupt NMIs to Linux handlers · 7c1bd80c
      Nicholas Piggin 提交于
      It's possible to take a SRESET or MCE in these paths due to a bug
      in the host code or a NMI IPI, etc. A recent bug attempting to load
      a virtual address from real mode gave th complete but cryptic error,
      abridged:
      
            Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1]
            LE SMP NR_CPUS=2048 NUMA PowerNV
            CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted
            NIP:  c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580
            REGS: c000000fff76dd80 TRAP: 0200   Not tainted
            MSR:  9000000000201003 <SF,HV,ME,RI,LE>  CR: 48082222  XER: 00000000
            CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3
            NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0
            LR [c0000000000c2430] do_tlbies+0x230/0x2f0
      
      Sending the NMIs through the Linux handlers gives a nicer output:
      
            Severe Machine check interrupt [Not recovered]
              NIP [c0000000000155ac]: perf_trace_tlbie+0x2c/0x1a0
              Initiator: CPU
              Error type: Real address [Load (bad)]
                Effective address: d00017fffcc01a28
            opal: Machine check interrupt unrecoverable: MSR(RI=0)
            opal: Hardware platform error: Unrecoverable Machine Check exception
            CPU: 0 PID: 6700 Comm: qemu-system-ppc Tainted: G   M
            NIP:  c0000000000155ac LR: c0000000000c23c0 CTR: c000000000015580
            REGS: c000000fff9e9d80 TRAP: 0200   Tainted: G   M
            MSR:  9000000000201001 <SF,HV,ME,LE>  CR: 48082222  XER: 00000000
            CFAR: 000000010cbc1a30 DAR: d00017fffcc01a28 DSISR: 00000040 SOFTE: 3
            NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0
            LR [c0000000000c23c0] do_tlbies+0x1c0/0x280
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7c1bd80c
    • N
      KVM: PPC: Book3S HV: Fix kvmppc_bad_host_intr for real mode interrupts · eadce3b4
      Nicholas Piggin 提交于
      When CONFIG_RELOCATABLE=n, the Linux real mode interrupt handlers call
      into KVM using real address. This needs to be translated to the kernel
      linear effective address before the MMU is switched on.
      
      kvmppc_bad_host_intr misses adding these bits, so when it is used to
      handle a system reset interrupt (that always gets delivered in real
      mode), it results in an instruction access fault immediately after
      the MMU is turned on.
      
      Fix this by ensuring the top 2 address bits are set when the MMU is
      turned on.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      eadce3b4
    • N
      KVM: PPC: Book3S HV: radix: Do not clear partition PTE when RC or write bits do not match · 878cf2bb
      Nicholas Piggin 提交于
      Adding the write bit and RC bits to pte permissions does not require a
      pte clear and flush. There should not be other bits changed here,
      because restricting access or changing the PFN must have already
      invalidated any existing ptes (otherwise the race is already lost).
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      878cf2bb
    • N
      KVM: PPC: Book3S HV: radix: Refine IO region partition scope attributes · bc64dd0e
      Nicholas Piggin 提交于
      When the radix fault handler has no page from the process address
      space (e.g., for IO memory), it looks up the process pte and sets
      partition table pte using that to get attributes like CI and guarded.
      If the process table entry is to be writable, set _PAGE_DIRTY as well
      to avoid an RC update. If not, then ensure _PAGE_DIRTY does not come
      across. Set _PAGE_ACCESSED as well to avoid RC update.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      bc64dd0e
    • N
      KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on · 9a4506e1
      Nicholas Piggin 提交于
      The radix guest code can has fewer restrictions about what context it
      can run in, so move this flushing out of assembly and have it use the
      Linux TLB flush implementations introduced previously.
      
      This allows powerpc:tlbie trace events to be used.
      
      This changes the tlbiel sequence to only execute RIC=2 flush once on
      the first set flushed, then RIC=0 for the rest of the sets. The end
      result of the flush should be unchanged. This matches the local PID
      flush pattern that was introduced in a5998fcb ("powerpc/mm/radix:
      Optimise tlbiel flush all case").
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      9a4506e1
    • N
      KVM: PPC: Book3S HV: Make radix use the Linux translation flush functions for partition scope · d91cb39f
      Nicholas Piggin 提交于
      This has the advantage of consolidating TLB flush code in fewer
      places, and it also implements powerpc:tlbie trace events.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      d91cb39f
    • N
      KVM: PPC: Book3S HV: Recursively unmap all page table entries when unmapping · a5704e83
      Nicholas Piggin 提交于
      When partition scope mappings are unmapped with kvm_unmap_radix, the
      pte is cleared, but the page table structure is left in place. If the
      next page fault requests a different page table geometry (e.g., due to
      THP promotion or split), kvmppc_create_pte is responsible for changing
      the page tables.
      
      When a page table entry is to be converted to a large pte, the page
      table entry is cleared, the PWC flushed, then the page table it points
      to freed. This will cause pte page tables to leak when a 1GB page is
      to replace a pud entry points to a pmd table with pte tables under it:
      The pmd table will be freed, but its pte tables will be missed.
      
      Fix this by replacing the simple clear and free code with one that
      walks down the page tables and frees children. Care must be taken to
      clear the root entry being unmapped then flushing the PWC before
      freeing any page tables, as explained in comments.
      
      This requires PWC flush to logically become a flush-all-PWC (which it
      already is in hardware, but the KVM API needs to be changed to avoid
      confusion).
      
      This code also checks that no unexpected pte entries exist in any page
      table being freed, and unmaps those and emits a WARN. This is an
      expensive operation for the pte page level, but partition scope
      changes are rare, so it's unconditional for now to iron out bugs. It
      can be put under a CONFIG option or removed after some time.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      a5704e83
    • N
    • N
      KVM: PPC: Book3S HV: Lockless tlbie for HPT hcalls · b7557451
      Nicholas Piggin 提交于
      tlbies to an LPAR do not have to be serialised since POWER4/PPC970,
      after which the MMU_FTR_LOCKLESS_TLBIE feature was introduced to
      avoid tlbie locking.
      
      Since commit c17b98cf ("KVM: PPC: Book3S HV: Remove code for
      PPC970 processors"), KVM no longer supports processors that do not
      have this feature, so the tlbie locking can be removed completely.
      A sanity check for the feature is put in kvmppc_mmu_hv_init.
      
      Testing was done on a POWER9 system in HPT mode, with a -smp 32 guest
      in HPT mode. 32 instances of the powerpc fork benchmark from selftests
      were run with --fork, and the results measured.
      
      Without this patch, total throughput was about 13.5K/sec, and this is
      the top of the host profile:
      
         74.52%  [k] do_tlbies
          2.95%  [k] kvmppc_book3s_hv_page_fault
          1.80%  [k] calc_checksum
          1.80%  [k] kvmppc_vcpu_run_hv
          1.49%  [k] kvmppc_run_core
      
      After this patch, throughput was about 51K/sec, with this profile:
      
         21.28%  [k] do_tlbies
          5.26%  [k] kvmppc_run_core
          4.88%  [k] kvmppc_book3s_hv_page_fault
          3.30%  [k] _raw_spin_lock_irqsave
          3.25%  [k] gup_pgd_range
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      b7557451
    • S
      KVM: PPC: Fix a mmio_host_swabbed uninitialized usage issue · f19d1f36
      Simon Guo 提交于
      When KVM emulates VMX store, it will invoke kvmppc_get_vmx_data() to
      retrieve VMX reg val. kvmppc_get_vmx_data() will check mmio_host_swabbed
      to decide which double word of vr[] to be used. But the
      mmio_host_swabbed can be uninitialized during VMX store procedure:
      
      kvmppc_emulate_loadstore
      	\- kvmppc_handle_store128_by2x64
      		\- kvmppc_get_vmx_data
      
      So vcpu->arch.mmio_host_swabbed is not meant to be used at all for
      emulation of store instructions, and this patch makes that true for
      VMX stores. This patch also initializes mmio_host_swabbed to avoid
      possible future problems.
      Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      f19d1f36
    • S
      KVM: PPC: Move nip/ctr/lr/xer registers to pt_regs in kvm_vcpu_arch · 173c520a
      Simon Guo 提交于
      This patch moves nip/ctr/lr/xer registers from scattered places in
      kvm_vcpu_arch to pt_regs structure.
      
      cr register is "unsigned long" in pt_regs and u32 in vcpu->arch.
      It will need more consideration and may move in later patches.
      Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      173c520a
    • S
      KVM: PPC: Add pt_regs into kvm_vcpu_arch and move vcpu->arch.gpr[] into it · 1143a706
      Simon Guo 提交于
      Current regs are scattered at kvm_vcpu_arch structure and it will
      be more neat to organize them into pt_regs structure.
      
      Also it will enable reimplementation of MMIO emulation code with
      analyse_instr() later.
      Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1143a706
  2. 17 5月, 2018 12 次提交
    • S
      KVM: PPC: Book3S: Change return type to vm_fault_t · 16d5c39d
      Souptick Joarder 提交于
      Use new return type vm_fault_t for fault handler
      in struct vm_operations_struct. For now, this is
      just documenting that the function returns a
      VM_FAULT value rather than an errno.  Once all
      instances are converted, vm_fault_t will become
      a distinct type.
      
      commit 1c8f4220 ("mm: change return type to
      vm_fault_t")
      Signed-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      16d5c39d
    • A
      KVM: PPC: Book3S: Check KVM_CREATE_SPAPR_TCE_64 parameters · e45719af
      Alexey Kardashevskiy 提交于
      Although it does not seem possible to break the host by passing bad
      parameters when creating a TCE table in KVM, it is still better to get
      an early clear indication of that than debugging weird effect this might
      bring.
      
      This adds some sanity checks that the page size is 4KB..16GB as this is
      what the actual LoPAPR supports and that the window actually fits 64bit
      space.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e45719af
    • A
      KVM: PPC: Book3S: Allow backing bigger guest IOMMU pages with smaller physical pages · ca1fc489
      Alexey Kardashevskiy 提交于
      At the moment we only support in the host the IOMMU page sizes which
      the guest is aware of, which is 4KB/64KB/16MB. However P9 does not support
      16MB IOMMU pages, 2MB and 1GB pages are supported instead. We can still
      emulate bigger guest pages (for example 16MB) with smaller host pages
      (4KB/64KB/2MB).
      
      This allows the physical IOMMU pages to use a page size smaller or equal
      than the guest visible IOMMU page size.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ca1fc489
    • A
      KVM: PPC: Book3S: Use correct page shift in H_STUFF_TCE · c6b61661
      Alexey Kardashevskiy 提交于
      The other TCE handlers use page shift from the guest visible TCE table
      (described by kvmppc_spapr_tce_iommu_table) so let's make H_STUFF_TCE
      handlers do the same thing.
      
      This should cause no behavioral change now but soon we will allow
      the iommu_table::it_page_shift being different from from the emulated
      table page size so this will play a role.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      c6b61661
    • P
      KVM: PPC: Book3S HV: Fix inaccurate comment · 48e70b1c
      Paul Mackerras 提交于
      We now have interrupts hard-disabled when coming back from
      kvmppc_hv_entry_trampoline, so this changes the comment to reflect
      that.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      48e70b1c
    • P
      KVM: PPC: Book3S HV: Set RWMR on POWER8 so PURR/SPURR count correctly · 7aa15842
      Paul Mackerras 提交于
      Although Linux doesn't use PURR and SPURR ((Scaled) Processor
      Utilization of Resources Register), other OSes depend on them.
      On POWER8 they count at a rate depending on whether the VCPU is
      idle or running, the activity of the VCPU, and the value in the
      RWMR (Region-Weighting Mode Register).  Hardware expects the
      hypervisor to update the RWMR when a core is dispatched to reflect
      the number of online VCPUs in the vcore.
      
      This adds code to maintain a count in the vcore struct indicating
      how many VCPUs are online.  In kvmppc_run_core we use that count
      to set the RWMR register on POWER8.  If the core is split because
      of a static or dynamic micro-threading mode, we use the value for
      8 threads.  The RWMR value is not relevant when the host is
      executing because Linux does not use the PURR or SPURR register,
      so we don't bother saving and restoring the host value.
      
      For the sake of old userspace which does not set the KVM_REG_PPC_ONLINE
      register, we set online to 1 if it was 0 at the time of a KVM_RUN
      ioctl.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7aa15842
    • P
      KVM: PPC: Book3S HV: Add 'online' register to ONE_REG interface · a1f15826
      Paul Mackerras 提交于
      This adds a new KVM_REG_PPC_ONLINE register which userspace can set
      to 0 or 1 via the GET/SET_ONE_REG interface to indicate whether it
      considers the VCPU to be offline (0), that is, not currently running,
      or online (1).  This will be used in a later patch to configure the
      register which controls PURR and SPURR accumulation.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      a1f15826
    • P
      KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path · df158189
      Paul Mackerras 提交于
      A radix guest can execute tlbie instructions to invalidate TLB entries.
      After a tlbie or a group of tlbies, it must then do the architected
      sequence eieio; tlbsync; ptesync to ensure that the TLB invalidation
      has been processed by all CPUs in the system before it can rely on
      no CPU using any translation that it just invalidated.
      
      In fact it is the ptesync which does the actual synchronization in
      this sequence, and hardware has a requirement that the ptesync must
      be executed on the same CPU thread as the tlbies which it is expected
      to order.  Thus, if a vCPU gets moved from one physical CPU to
      another after it has done some tlbies but before it can get to do the
      ptesync, the ptesync will not have the desired effect when it is
      executed on the second physical CPU.
      
      To fix this, we do a ptesync in the exit path for radix guests.  If
      there are any pending tlbies, this will wait for them to complete.
      If there aren't, then ptesync will just do the same as sync.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      df158189
    • B
      KVM: PPC: Book3S HV: XIVE: Resend re-routed interrupts on CPU priority change · 9dc81d6b
      Benjamin Herrenschmidt 提交于
      When a vcpu priority (CPPR) is set to a lower value (masking more
      interrupts), we stop processing interrupts already in the queue
      for the priorities that have now been masked.
      
      If those interrupts were previously re-routed to a different
      CPU, they might still be stuck until the older one that has
      them in its queue processes them. In the case of guest CPU
      unplug, that can be never.
      
      To address that without creating additional overhead for
      the normal interrupt processing path, this changes H_CPPR
      handling so that when such a priority change occurs, we
      scan the interrupt queue for that vCPU, and for any
      interrupt in there that has been re-routed, we replace it
      with a dummy and force a re-trigger.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      9dc81d6b
    • N
      KVM: PPC: Book3S HV: Make radix clear pte when unmapping · 7e3d9a1d
      Nicholas Piggin 提交于
      The current partition table unmap code clears the _PAGE_PRESENT bit
      out of the pte, which leaves pud_huge/pmd_huge true and does not
      clear pud_present/pmd_present.  This can confuse subsequent page
      faults and possibly lead to the guest looping doing continual
      hypervisor page faults.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7e3d9a1d
    • N
      KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in kvmppc_radix_tlbie_page · e2560b10
      Nicholas Piggin 提交于
      The standard eieio ; tlbsync ; ptesync must follow tlbie to ensure it
      is ordered with respect to subsequent operations.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e2560b10
    • P
      KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry · 57b8daa7
      Paul Mackerras 提交于
      Currently, the HV KVM guest entry/exit code adds the timebase offset
      from the vcore struct to the timebase on guest entry, and subtracts
      it on guest exit.  Which is fine, except that it is possible for
      userspace to change the offset using the SET_ONE_REG interface while
      the vcore is running, as there is only one timebase offset per vcore
      but potentially multiple VCPUs in the vcore.  If that were to happen,
      KVM would subtract a different offset on guest exit from that which
      it had added on guest entry, leading to the timebase being out of sync
      between cores in the host, which then leads to bad things happening
      such as hangs and spurious watchdog timeouts.
      
      To fix this, we add a new field 'tb_offset_applied' to the vcore struct
      which stores the offset that is currently applied to the timebase.
      This value is set from the vcore tb_offset field on guest entry, and
      is what is subtracted from the timebase on guest exit.  Since it is
      zero when the timebase offset is not applied, we can simplify the
      logic in kvmhv_start_timing and kvmhv_accumulate_time.
      
      In addition, we had secondary threads reading the timebase while
      running concurrently with code on the primary thread which would
      eventually add or subtract the timebase offset from the timebase.
      This occurred while saving or restoring the DEC register value on
      the secondary threads.  Although no specific incorrect behaviour has
      been observed, this is a race which should be fixed.  To fix it, we
      move the DEC saving code to just before we call kvmhv_commence_exit,
      and the DEC restoring code to after the point where we have waited
      for the primary thread to switch the MMU context and add the timebase
      offset.  That way we are sure that the timebase contains the guest
      timebase value in both cases.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      57b8daa7
  3. 15 5月, 2018 1 次提交
  4. 27 4月, 2018 1 次提交
  5. 11 4月, 2018 1 次提交
    • N
      KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode · 19ce7909
      Nicholas Piggin 提交于
      This crashes with a "Bad real address for load" attempting to load
      from the vmalloc region in realmode (faulting address is in DAR).
      
        Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1]
        LE SMP NR_CPUS=2048 NUMA PowerNV
        CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted 4.16.0-01530-g43d1859f0994
        NIP:  c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580
        REGS: c000000fff76dd80 TRAP: 0200   Not tainted  (4.16.0-01530-g43d1859f0994)
        MSR:  9000000000201003 <SF,HV,ME,RI,LE>  CR: 48082222  XER: 00000000
        CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3
        NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0
        LR [c0000000000c2430] do_tlbies+0x230/0x2f0
      
      I suspect the reason is the per-cpu data is not in the linear chunk.
      This could be restored if that was able to be fixed, but for now,
      just remove the tracepoints.
      
      Fixes: 0428491c ("powerpc/mm: Trace tlbie(l) instructions")
      Cc: stable@vger.kernel.org # v4.13+
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      19ce7909
  6. 03 4月, 2018 1 次提交
  7. 31 3月, 2018 2 次提交
    • N
      powerpc/64s: Remove POWER4 support · 471d7ff8
      Nicholas Piggin 提交于
      POWER4 has been broken since at least the change 49d09bf2
      ("powerpc/64s: Optimise MSR handling in exception handling"), which
      requires mtmsrd L=1 support. This was introduced in ISA v2.01, and
      POWER4 supports ISA v2.00.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      471d7ff8
    • A
      powerpc/kvm: Fix guest boot failure on Power9 since DAWR changes · ca9a16c3
      Aneesh Kumar K.V 提交于
      SLOF checks for 'sc 1' (hypercall) support by issuing a hcall with
      H_SET_DABR. Since the recent commit e8ebedbf ("KVM: PPC: Book3S
      HV: Return error from h_set_dabr() on POWER9") changed H_SET_DABR to
      return H_UNSUPPORTED on Power9, we see guest boot failures, the
      symptom is the boot seems to just stop in SLOF, eg:
      
        SLOF ***************************************************************
        QEMU Starting
         Build Date = Sep 24 2017 12:23:07
         FW Version = buildd@ release 20170724
        <no further output>
      
      SLOF can cope if H_SET_DABR returns H_HARDWARE. So wwitch the return
      value to H_HARDWARE instead of H_UNSUPPORTED so that we don't break
      the guest boot.
      
      That does mean we return a different error to PowerVM in this case,
      but that's probably not a big concern.
      
      Fixes: e8ebedbf ("KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ca9a16c3
  8. 30 3月, 2018 3 次提交
  9. 28 3月, 2018 1 次提交
    • P
      KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot() in page fault handler · 31c8b0d0
      Paul Mackerras 提交于
      This changes the hypervisor page fault handler for radix guests to use
      the generic KVM __gfn_to_pfn_memslot() function instead of using
      get_user_pages_fast() and then handling the case of VM_PFNMAP vmas
      specially.  The old code missed the case of VM_IO vmas; with this
      change, VM_IO vmas will now be handled correctly by code within
      __gfn_to_pfn_memslot.
      
      Currently, __gfn_to_pfn_memslot calls hva_to_pfn, which only uses
      __get_user_pages_fast for the initial lookup in the cases where
      either atomic or async is set.  Since we are not setting either
      atomic or async, we do our own __get_user_pages_fast first, for now.
      
      This also adds code to check for the KVM_MEM_READONLY flag on the
      memslot.  If it is set and this is a write access, we synthesize a
      data storage interrupt for the guest.
      
      In the case where the page is not normal RAM (i.e. page == NULL in
      kvmppc_book3s_radix_page_fault(), we read the PTE from the Linux page
      tables because we need the mapping attribute bits as well as the PFN.
      (The mapping attribute bits indicate whether accesses have to be
      non-cacheable and/or guarded.)
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      31c8b0d0
  10. 27 3月, 2018 3 次提交
  11. 23 3月, 2018 2 次提交
    • P
      KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend state · 681c617b
      Paul Mackerras 提交于
      This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
      where the contents of the TEXASR can get corrupted while a thread is
      in fake suspend state.  The workaround is for the instruction emulation
      code to use the value saved at the most recent guest exit in real
      suspend mode.  We achieve this by simply not saving the TEXASR into
      the vcpu struct on an exit in fake suspend state.  We also have to
      take care to set the orig_texasr field only on guest exit in real
      suspend state.
      
      This also means that on guest entry in fake suspend state, TEXASR
      will be restored to the value it had on the last exit in real suspend
      state, effectively counteracting any hardware-caused corruption.  This
      works because TEXASR may not be written in suspend state.
      
      With this, the guest might see the wrong values in TEXASR if it reads
      it while in suspend state, but will see the correct value in
      non-transactional state (e.g. after a treclaim), and treclaim will
      work correctly.
      
      With this workaround, the code will actually run slightly faster, and
      will operate correctly on systems without the TEXASR bug (since TEXASR
      may not be written in suspend state, and is only changed by failure
      recording, which will have already been done before we get into fake
      suspend state).  Therefore these changes are not made subject to a CPU
      feature bit.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      681c617b
    • S
      KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode · 87a11bb6
      Suraj Jitindar Singh 提交于
      This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
      where a treclaim performed in fake suspend mode can cause subsequent
      reads from the XER register to return inconsistent values for the SO
      (summary overflow) bit.  The inconsistent SO bit state can potentially
      be observed on any thread in the core.  We have to do the treclaim
      because that is the only way to get the thread out of suspend state
      (fake or real) and into non-transactional state.
      
      The workaround for the bug is to force the core into SMT4 mode before
      doing the treclaim.  This patch adds the code to do that, conditional
      on the CPU_FTR_P9_TM_XER_SO_BUG feature bit.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      87a11bb6