1. 01 6月, 2018 19 次提交
  2. 31 5月, 2018 3 次提交
    • S
      KVM: PPC: Book3S PR: Move kvmppc_save_tm/kvmppc_restore_tm to separate file · 009c872a
      Simon Guo 提交于
      It is a simple patch just for moving kvmppc_save_tm/kvmppc_restore_tm()
      functionalities to tm.S. There is no logic change. The reconstruct of
      those APIs will be done in later patches to improve readability.
      
      It is for preparation of reusing those APIs on both HV/PR PPC KVM.
      
      Some slight change during move the functions includes:
      - surrounds some HV KVM specific code with CONFIG_KVM_BOOK3S_HV_POSSIBLE
      for compilation.
      - use _GLOBAL() to define kvmppc_save_tm/kvmppc_restore_tm()
      
      [paulus@ozlabs.org - rebased on top of 7b0e827c ("KVM: PPC: Book3S HV:
       Factor fake-suspend handling out of kvmppc_save/restore_tm", 2018-05-30)]
      Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      009c872a
    • P
      KVM: PPC: Book3S HV: Factor fake-suspend handling out of kvmppc_save/restore_tm · 7b0e827c
      Paul Mackerras 提交于
      This splits out the handling of "fake suspend" mode, part of the
      hypervisor TM assist code for POWER9, and puts almost all of it in
      new kvmppc_save_tm_hv and kvmppc_restore_tm_hv functions.  The new
      functions branch to kvmppc_save/restore_tm if the CPU does not
      require hypervisor TM assistance.
      
      With this, it will be more straightforward to move kvmppc_save_tm and
      kvmppc_restore_tm to another file and use them for transactional
      memory support in PR KVM.  Additionally, it also makes the code a
      bit clearer and reduces the number of feature sections.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7b0e827c
    • P
      KVM: PPC: Book3S PR: Allow KVM_PPC_CONFIGURE_V3_MMU to succeed · 9617a0b3
      Paul Mackerras 提交于
      Currently, PR KVM does not implement the configure_mmu operation, and
      so the KVM_PPC_CONFIGURE_V3_MMU ioctl always fails with an EINVAL
      error.  This causes recent kernels to fail to boot as a PR KVM guest
      on POWER9, since recent kernels booted in HPT mode do the
      H_REGISTER_PROC_TBL hypercall, which causes userspace (QEMU) to do
      KVM_PPC_CONFIGURE_V3_MMU, which fails.
      
      This implements a minimal configure_mmu operation for PR KVM.  It
      succeeds only if the MMU is being configured for HPT mode and no
      process table is being registered.  This is enough to get recent
      kernels to boot as a PR KVM guest.
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Tested-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      9617a0b3
  3. 24 5月, 2018 3 次提交
  4. 22 5月, 2018 7 次提交
  5. 18 5月, 2018 8 次提交
    • P
      KVM: PPC: Book3S PR: Enable use on POWER9 inside HPT-mode guests · ec531d02
      Paul Mackerras 提交于
      This relaxes the restriction on using PR KVM on POWER9.  The existing
      code does work inside a guest partition running in HPT mode, because
      hypercalls such as H_ENTER use the old HPTE format, not the new
      format used by POWER9, and so no change to PR KVM's HPT manipulation
      code is required.  PR KVM will still refuse to run if the kernel is
      using radix translation or if it is running bare-metal.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ec531d02
    • N
      KVM: PPC: Book3S HV: Send kvmppc_bad_interrupt NMIs to Linux handlers · 7c1bd80c
      Nicholas Piggin 提交于
      It's possible to take a SRESET or MCE in these paths due to a bug
      in the host code or a NMI IPI, etc. A recent bug attempting to load
      a virtual address from real mode gave th complete but cryptic error,
      abridged:
      
            Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1]
            LE SMP NR_CPUS=2048 NUMA PowerNV
            CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted
            NIP:  c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580
            REGS: c000000fff76dd80 TRAP: 0200   Not tainted
            MSR:  9000000000201003 <SF,HV,ME,RI,LE>  CR: 48082222  XER: 00000000
            CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3
            NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0
            LR [c0000000000c2430] do_tlbies+0x230/0x2f0
      
      Sending the NMIs through the Linux handlers gives a nicer output:
      
            Severe Machine check interrupt [Not recovered]
              NIP [c0000000000155ac]: perf_trace_tlbie+0x2c/0x1a0
              Initiator: CPU
              Error type: Real address [Load (bad)]
                Effective address: d00017fffcc01a28
            opal: Machine check interrupt unrecoverable: MSR(RI=0)
            opal: Hardware platform error: Unrecoverable Machine Check exception
            CPU: 0 PID: 6700 Comm: qemu-system-ppc Tainted: G   M
            NIP:  c0000000000155ac LR: c0000000000c23c0 CTR: c000000000015580
            REGS: c000000fff9e9d80 TRAP: 0200   Tainted: G   M
            MSR:  9000000000201001 <SF,HV,ME,LE>  CR: 48082222  XER: 00000000
            CFAR: 000000010cbc1a30 DAR: d00017fffcc01a28 DSISR: 00000040 SOFTE: 3
            NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0
            LR [c0000000000c23c0] do_tlbies+0x1c0/0x280
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      7c1bd80c
    • N
      KVM: PPC: Book3S HV: Fix kvmppc_bad_host_intr for real mode interrupts · eadce3b4
      Nicholas Piggin 提交于
      When CONFIG_RELOCATABLE=n, the Linux real mode interrupt handlers call
      into KVM using real address. This needs to be translated to the kernel
      linear effective address before the MMU is switched on.
      
      kvmppc_bad_host_intr misses adding these bits, so when it is used to
      handle a system reset interrupt (that always gets delivered in real
      mode), it results in an instruction access fault immediately after
      the MMU is turned on.
      
      Fix this by ensuring the top 2 address bits are set when the MMU is
      turned on.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      eadce3b4
    • N
      KVM: PPC: Book3S HV: radix: Do not clear partition PTE when RC or write bits do not match · 878cf2bb
      Nicholas Piggin 提交于
      Adding the write bit and RC bits to pte permissions does not require a
      pte clear and flush. There should not be other bits changed here,
      because restricting access or changing the PFN must have already
      invalidated any existing ptes (otherwise the race is already lost).
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      878cf2bb
    • N
      KVM: PPC: Book3S HV: radix: Refine IO region partition scope attributes · bc64dd0e
      Nicholas Piggin 提交于
      When the radix fault handler has no page from the process address
      space (e.g., for IO memory), it looks up the process pte and sets
      partition table pte using that to get attributes like CI and guarded.
      If the process table entry is to be writable, set _PAGE_DIRTY as well
      to avoid an RC update. If not, then ensure _PAGE_DIRTY does not come
      across. Set _PAGE_ACCESSED as well to avoid RC update.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      bc64dd0e
    • N
      KVM: PPC: Book3S HV: Make radix handle process scoped LPID flush in C, with relocation on · 9a4506e1
      Nicholas Piggin 提交于
      The radix guest code can has fewer restrictions about what context it
      can run in, so move this flushing out of assembly and have it use the
      Linux TLB flush implementations introduced previously.
      
      This allows powerpc:tlbie trace events to be used.
      
      This changes the tlbiel sequence to only execute RIC=2 flush once on
      the first set flushed, then RIC=0 for the rest of the sets. The end
      result of the flush should be unchanged. This matches the local PID
      flush pattern that was introduced in a5998fcb ("powerpc/mm/radix:
      Optimise tlbiel flush all case").
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      9a4506e1
    • N
      KVM: PPC: Book3S HV: Make radix use the Linux translation flush functions for partition scope · d91cb39f
      Nicholas Piggin 提交于
      This has the advantage of consolidating TLB flush code in fewer
      places, and it also implements powerpc:tlbie trace events.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      d91cb39f
    • N
      KVM: PPC: Book3S HV: Recursively unmap all page table entries when unmapping · a5704e83
      Nicholas Piggin 提交于
      When partition scope mappings are unmapped with kvm_unmap_radix, the
      pte is cleared, but the page table structure is left in place. If the
      next page fault requests a different page table geometry (e.g., due to
      THP promotion or split), kvmppc_create_pte is responsible for changing
      the page tables.
      
      When a page table entry is to be converted to a large pte, the page
      table entry is cleared, the PWC flushed, then the page table it points
      to freed. This will cause pte page tables to leak when a 1GB page is
      to replace a pud entry points to a pmd table with pte tables under it:
      The pmd table will be freed, but its pte tables will be missed.
      
      Fix this by replacing the simple clear and free code with one that
      walks down the page tables and frees children. Care must be taken to
      clear the root entry being unmapped then flushing the PWC before
      freeing any page tables, as explained in comments.
      
      This requires PWC flush to logically become a flush-all-PWC (which it
      already is in hardware, but the KVM API needs to be changed to avoid
      confusion).
      
      This code also checks that no unexpected pte entries exist in any page
      table being freed, and unmaps those and emits a WARN. This is an
      expensive operation for the pte page level, but partition scope
      changes are rare, so it's unconditional for now to iron out bugs. It
      can be put under a CONFIG option or removed after some time.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      a5704e83