1. 31 3月, 2018 1 次提交
    • A
      powerpc/kvm: Fix guest boot failure on Power9 since DAWR changes · ca9a16c3
      Aneesh Kumar K.V 提交于
      SLOF checks for 'sc 1' (hypercall) support by issuing a hcall with
      H_SET_DABR. Since the recent commit e8ebedbf ("KVM: PPC: Book3S
      HV: Return error from h_set_dabr() on POWER9") changed H_SET_DABR to
      return H_UNSUPPORTED on Power9, we see guest boot failures, the
      symptom is the boot seems to just stop in SLOF, eg:
      
        SLOF ***************************************************************
        QEMU Starting
         Build Date = Sep 24 2017 12:23:07
         FW Version = buildd@ release 20170724
        <no further output>
      
      SLOF can cope if H_SET_DABR returns H_HARDWARE. So wwitch the return
      value to H_HARDWARE instead of H_UNSUPPORTED so that we don't break
      the guest boot.
      
      That does mean we return a different error to PowerVM in this case,
      but that's probably not a big concern.
      
      Fixes: e8ebedbf ("KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ca9a16c3
  2. 30 3月, 2018 3 次提交
  3. 27 3月, 2018 3 次提交
  4. 23 3月, 2018 4 次提交
    • P
      KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend state · 681c617b
      Paul Mackerras 提交于
      This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
      where the contents of the TEXASR can get corrupted while a thread is
      in fake suspend state.  The workaround is for the instruction emulation
      code to use the value saved at the most recent guest exit in real
      suspend mode.  We achieve this by simply not saving the TEXASR into
      the vcpu struct on an exit in fake suspend state.  We also have to
      take care to set the orig_texasr field only on guest exit in real
      suspend state.
      
      This also means that on guest entry in fake suspend state, TEXASR
      will be restored to the value it had on the last exit in real suspend
      state, effectively counteracting any hardware-caused corruption.  This
      works because TEXASR may not be written in suspend state.
      
      With this, the guest might see the wrong values in TEXASR if it reads
      it while in suspend state, but will see the correct value in
      non-transactional state (e.g. after a treclaim), and treclaim will
      work correctly.
      
      With this workaround, the code will actually run slightly faster, and
      will operate correctly on systems without the TEXASR bug (since TEXASR
      may not be written in suspend state, and is only changed by failure
      recording, which will have already been done before we get into fake
      suspend state).  Therefore these changes are not made subject to a CPU
      feature bit.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      681c617b
    • S
      KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode · 87a11bb6
      Suraj Jitindar Singh 提交于
      This works around a hardware bug in "Nimbus" POWER9 DD2.2 processors,
      where a treclaim performed in fake suspend mode can cause subsequent
      reads from the XER register to return inconsistent values for the SO
      (summary overflow) bit.  The inconsistent SO bit state can potentially
      be observed on any thread in the core.  We have to do the treclaim
      because that is the only way to get the thread out of suspend state
      (fake or real) and into non-transactional state.
      
      The workaround for the bug is to force the core into SMT4 mode before
      doing the treclaim.  This patch adds the code to do that, conditional
      on the CPU_FTR_P9_TM_XER_SO_BUG feature bit.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      87a11bb6
    • P
      KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9 · 4bb3c7a0
      Paul Mackerras 提交于
      POWER9 has hardware bugs relating to transactional memory and thread
      reconfiguration (changes to hardware SMT mode).  Specifically, the core
      does not have enough storage to store a complete checkpoint of all the
      architected state for all four threads.  The DD2.2 version of POWER9
      includes hardware modifications designed to allow hypervisor software
      to implement workarounds for these problems.  This patch implements
      those workarounds in KVM code so that KVM guests see a full, working
      transactional memory implementation.
      
      The problems center around the use of TM suspended state, where the
      CPU has a checkpointed state but execution is not transactional.  The
      workaround is to implement a "fake suspend" state, which looks to the
      guest like suspended state but the CPU does not store a checkpoint.
      In this state, any instruction that would cause a transition to
      transactional state (rfid, rfebb, mtmsrd, tresume) or would use the
      checkpointed state (treclaim) causes a "soft patch" interrupt (vector
      0x1500) to the hypervisor so that it can be emulated.  The trechkpt
      instruction also causes a soft patch interrupt.
      
      On POWER9 DD2.2, we avoid returning to the guest in any state which
      would require a checkpoint to be present.  The trechkpt in the guest
      entry path which would normally create that checkpoint is replaced by
      either a transition to fake suspend state, if the guest is in suspend
      state, or a rollback to the pre-transactional state if the guest is in
      transactional state.  Fake suspend state is indicated by a flag in the
      PACA plus a new bit in the PSSCR.  The new PSSCR bit is write-only and
      reads back as 0.
      
      On exit from the guest, if the guest is in fake suspend state, we still
      do the treclaim instruction as we would in real suspend state, in order
      to get into non-transactional state, but we do not save the resulting
      register state since there was no checkpoint.
      
      Emulation of the instructions that cause a softpatch interrupt is
      handled in two paths.  If the guest is in real suspend mode, we call
      kvmhv_p9_tm_emulation_early() to handle the cases where the guest is
      transitioning to transactional state.  This is called before we do the
      treclaim in the guest exit path; because we haven't done treclaim, we
      can get back to the guest with the transaction still active.  If the
      instruction is a case that kvmhv_p9_tm_emulation_early() doesn't
      handle, or if the guest is in fake suspend state, then we proceed to
      do the complete guest exit path and subsequently call
      kvmhv_p9_tm_emulation() in host context with the MMU on.  This handles
      all the cases including the cases that generate program interrupts
      (illegal instruction or TM Bad Thing) and facility unavailable
      interrupts.
      
      The emulation is reasonably straightforward and is mostly concerned
      with checking for exception conditions and updating the state of
      registers such as MSR and CR0.  The treclaim emulation takes care to
      ensure that the TEXASR register gets updated as if it were the guest
      treclaim instruction that had done failure recording, not the treclaim
      done in hypervisor state in the guest exit path.
      
      With this, the KVM_CAP_PPC_HTM capability returns true (1) even if
      transactional memory is not available to host userspace.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4bb3c7a0
    • A
      powerpc/mm: Fixup tlbie vs store ordering issue on POWER9 · a5d4b589
      Aneesh Kumar K.V 提交于
      On POWER9, under some circumstances, a broadcast TLB invalidation
      might complete before all previous stores have drained, potentially
      allowing stale stores from becoming visible after the invalidation.
      This works around it by doubling up those TLB invalidations which was
      verified by HW to be sufficient to close the risk window.
      
      This will be documented in a yet-to-be-published errata.
      
      Fixes: 1a472c9d ("powerpc/mm/radix: Add tlbflush routines")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Enable the feature in the DT CPU features code for all Power9,
            rename the feature to CPU_FTR_P9_TLBIE_BUG per benh.]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a5d4b589
  5. 22 2月, 2018 1 次提交
    • I
      treewide/trivial: Remove ';;$' typo noise · ed7158ba
      Ingo Molnar 提交于
      On lkml suggestions were made to split up such trivial typo fixes into per subsystem
      patches:
      
        --- a/arch/x86/boot/compressed/eboot.c
        +++ b/arch/x86/boot/compressed/eboot.c
        @@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height)
                struct efi_uga_draw_protocol *uga = NULL, *first_uga;
                efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
                unsigned long nr_ugas;
        -       u32 *handles = (u32 *)uga_handle;;
        +       u32 *handles = (u32 *)uga_handle;
                efi_status_t status = EFI_INVALID_PARAMETER;
                int i;
      
      This patch is the result of the following script:
      
        $ sed -i 's/;;$/;/g' $(git grep -E ';;$'  | grep "\.[ch]:"  | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq)
      
      ... followed by manual review to make sure it's all good.
      
      Splitting this up is just crazy talk, let's get over with this and just do it.
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ed7158ba
  6. 13 2月, 2018 2 次提交
    • P
      KVM: PPC: Book3S: Fix compile error that occurs with some gcc versions · 6df3877f
      Paul Mackerras 提交于
      Some versions of gcc generate a warning that the variable "emulated"
      may be used uninitialized in function kvmppc_handle_load128_by2x64().
      It would be used uninitialized if kvmppc_handle_load128_by2x64 was
      ever called with vcpu->arch.mmio_vmx_copy_nums == 0, but neither of
      the callers ever do that, so there is no actual bug.  When gcc
      generates a warning, it causes the build to fail because arch/powerpc
      is compiled with -Werror.
      
      This silences the warning by initializing "emulated" to EMULATE_DONE.
      
      Fixes: 09f98496 ("KVM: PPC: Book3S: Add MMIO emulation for VMX instructions")
      Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      6df3877f
    • P
      KVM: PPC: Fix compile error that occurs when CONFIG_ALTIVEC=n · c662f773
      Paul Mackerras 提交于
      Commit accb757d ("KVM: Move vcpu_load to arch-specific
      kvm_arch_vcpu_ioctl_run", 2017-12-04) added a "goto out"
      statement and an "out:" label to kvm_arch_vcpu_ioctl_run().
      Since the only "goto out" is inside a CONFIG_VSX block,
      compiling with CONFIG_VSX=n gives a warning that label "out"
      is defined but not used, and because arch/powerpc is compiled
      with -Werror, that becomes a compile error that makes the kernel
      build fail.
      
      Merge commit 1ab03c07 ("Merge tag 'kvm-ppc-next-4.16-2' of
      git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc",
      2018-02-09) added a similar block of code inside a #ifdef
      CONFIG_ALTIVEC, with a "goto out" statement.
      
      In order to make the build succeed, this adds a #ifdef around the
      "out:" label.  This is a minimal, ugly fix, to be replaced later
      by a refactoring of the code.  Since CONFIG_VSX depends on
      CONFIG_ALTIVEC, it is sufficient to use #ifdef CONFIG_ALTIVEC here.
      
      Fixes: accb757d ("KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_run")
      Reported-by: NChristian Zigotzky <chzigotzky@xenosoft.de>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      c662f773
  7. 09 2月, 2018 4 次提交
    • J
      KVM: PPC: Book3S: Add MMIO emulation for VMX instructions · 09f98496
      Jose Ricardo Ziviani 提交于
      This patch provides the MMIO load/store vector indexed
      X-Form emulation.
      
      Instructions implemented:
      lvx: the quadword in storage addressed by the result of EA &
      0xffff_ffff_ffff_fff0 is loaded into VRT.
      
      stvx: the contents of VRS are stored into the quadword in storage
      addressed by the result of EA & 0xffff_ffff_ffff_fff0.
      Reported-by: NGopesh Kumar Chaudhary <gopchaud@in.ibm.com>
      Reported-by: NBalamuruhan S <bala24@linux.vnet.ibm.com>
      Signed-off-by: NJose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      09f98496
    • A
      KVM: PPC: Book3S HV: Branch inside feature section · d20fe50a
      Alexander Graf 提交于
      We ended up with code that did a conditional branch inside a feature
      section to code outside of the feature section. Depending on how the
      object file gets organized, that might mean we exceed the 14bit
      relocation limit for conditional branches:
      
        arch/powerpc/kvm/built-in.o:arch/powerpc/kvm/book3s_hv_rmhandlers.S:416:(__ftr_alt_97+0x8): relocation truncated to fit: R_PPC64_REL14 against `.text'+1ca4
      
      So instead of doing a conditional branch outside of the feature section,
      let's just jump at the end of the same, making the branch very short.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      d20fe50a
    • D
      KVM: PPC: Book3S HV: Make HPT resizing work on POWER9 · 790a9df5
      David Gibson 提交于
      This adds code to enable the HPT resizing code to work on POWER9,
      which uses a slightly modified HPT entry format compared to POWER8.
      On POWER9, we convert HPTEs read from the HPT from the new format to
      the old format so that the rest of the HPT resizing code can work as
      before.  HPTEs written to the new HPT are converted to the new format
      as the last step before writing them into the new HPT.
      
      This takes out the checks added by commit bcd3bb63 ("KVM: PPC:
      Book3S HV: Disable HPT resizing on POWER9 for now", 2017-02-18),
      now that HPT resizing works on POWER9.
      
      On POWER9, when we pivot to the new HPT, we now call
      kvmppc_setup_partition_table() to update the partition table in order
      to make the hardware use the new HPT.
      
      [paulus@ozlabs.org - added kvmppc_setup_partition_table() call,
       wrote commit message.]
      Tested-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      790a9df5
    • P
      KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code · 05f2bb03
      Paul Mackerras 提交于
      This fixes the computation of the HPTE index to use when the HPT
      resizing code encounters a bolted HPTE which is stored in its
      secondary HPTE group.  The code inverts the HPTE group number, which
      is correct, but doesn't then mask it with new_hash_mask.  As a result,
      new_pteg will be effectively negative, resulting in new_hptep
      pointing before the new HPT, which will corrupt memory.
      
      In addition, this removes two BUG_ON statements.  The condition that
      the BUG_ONs were testing -- that we have computed the hash value
      incorrectly -- has never been observed in testing, and if it did
      occur, would only affect the guest, not the host.  Given that
      BUG_ON should only be used in conditions where the kernel (i.e.
      the host kernel, in this case) can't possibly continue execution,
      it is not appropriate here.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      05f2bb03
  8. 08 2月, 2018 1 次提交
  9. 01 2月, 2018 2 次提交
    • A
      KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled · 07ae5389
      Alexander Graf 提交于
      When copying between the vcpu and svcpu, we may get scheduled away onto
      a different host CPU which in turn means our svcpu pointer may change.
      
      That means we need to atomically copy to and from the svcpu with preemption
      disabled, so that all code around it always sees a coherent state.
      Reported-by: NSimon Guo <wei.guo.simon@gmail.com>
      Fixes: 3d3319b4 ("KVM: PPC: Book3S: PR: Enable interrupts earlier")
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      07ae5389
    • P
      KVM: PPC: Book3S HV: Drop locks before reading guest memory · 36ee41d1
      Paul Mackerras 提交于
      Running with CONFIG_DEBUG_ATOMIC_SLEEP reveals that HV KVM tries to
      read guest memory, in order to emulate guest instructions, while
      preempt is disabled and a vcore lock is held.  This occurs in
      kvmppc_handle_exit_hv(), called from post_guest_process(), when
      emulating guest doorbell instructions on POWER9 systems, and also
      when checking whether we have hit a hypervisor breakpoint.
      Reading guest memory can cause a page fault and thus cause the
      task to sleep, so we need to avoid reading guest memory while
      holding a spinlock or when preempt is disabled.
      
      To fix this, we move the preempt_enable() in kvmppc_run_core() to
      before the loop that calls post_guest_process() for each vcore that
      has just run, and we drop and re-take the vcore lock around the calls
      to kvmppc_emulate_debug_inst() and kvmppc_emulate_doorbell_instr().
      
      Dropping the lock is safe with respect to the iteration over the
      runnable vcpus in post_guest_process(); for_each_runnable_thread
      is actually safe to use locklessly.  It is possible for a vcpu
      to become runnable and add itself to the runnable_threads array
      (code near the beginning of kvmppc_run_vcpu()) and then get included
      in the iteration in post_guest_process despite the fact that it
      has not just run.  This is benign because vcpu->arch.trap and
      vcpu->arch.ceded will be zero.
      
      Cc: stable@vger.kernel.org # v4.13+
      Fixes: 57900694 ("KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      36ee41d1
  10. 22 1月, 2018 1 次提交
    • R
      powerpc: Use octal numbers for file permissions · 57ad583f
      Russell Currey 提交于
      Symbolic macros are unintuitive and hard to read, whereas octal constants
      are much easier to interpret.  Replace macros for the basic permission
      flags (user/group/other read/write/execute) with numeric constants
      instead, across the whole powerpc tree.
      
      Introducing a significant number of changes across the tree for no runtime
      benefit isn't exactly desirable, but so long as these macros are still
      used in the tree people will keep sending patches that add them.  Not only
      are they hard to parse at a glance, there are multiple ways of coming to
      the same value (as you can see with 0444 and 0644 in this patch) which
      hurts readability.
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Reviewed-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      57ad583f
  11. 21 1月, 2018 1 次提交
  12. 19 1月, 2018 8 次提交
  13. 18 1月, 2018 2 次提交
    • P
      KVM: PPC: Book3S HV: Improve handling of debug-trigger HMIs on POWER9 · d075745d
      Paul Mackerras 提交于
      Hypervisor maintenance interrupts (HMIs) are generated by various
      causes, signalled by bits in the hypervisor maintenance exception
      register (HMER).  In most cases calling OPAL to handle the interrupt
      is the correct thing to do, but the "debug trigger" HMIs signalled by
      PPC bit 17 (bit 46) of HMER are used to invoke software workarounds
      for hardware bugs, and OPAL does not have any code to handle this
      cause.  The debug trigger HMI is used in POWER9 DD2.0 and DD2.1 chips
      to work around a hardware bug in executing vector load instructions to
      cache inhibited memory.  In POWER9 DD2.2 chips, it is generated when
      conditions are detected relating to threads being in TM (transactional
      memory) suspended mode when the core SMT configuration needs to be
      reconfigured.
      
      The kernel currently has code to detect the vector CI load condition,
      but only when the HMI occurs in the host, not when it occurs in a
      guest.  If a HMI occurs in the guest, it is always passed to OPAL, and
      then we always re-sync the timebase, because the HMI cause might have
      been a timebase error, for which OPAL would re-sync the timebase, thus
      removing the timebase offset which KVM applied for the guest.  Since
      we don't know what OPAL did, we don't know whether to subtract the
      timebase offset from the timebase, so instead we re-sync the timebase.
      
      This adds code to determine explicitly what the cause of a debug
      trigger HMI will be.  This is based on a new device-tree property
      under the CPU nodes called ibm,hmi-special-triggers, if it is
      present, or otherwise based on the PVR (processor version register).
      The handling of debug trigger HMIs is pulled out into a separate
      function which can be called from the KVM guest exit code.  If this
      function handles and clears the HMI, and no other HMI causes remain,
      then we skip calling OPAL and we proceed to subtract the guest
      timebase offset from the timebase.
      
      The overall handling for HMIs that occur in the host (i.e. not in a
      KVM guest) is largely unchanged, except that we now don't set the flag
      for the vector CI load workaround on DD2.2 processors.
      
      This also removes a BUG_ON in the KVM code.  BUG_ON is generally not
      useful in KVM guest entry/exit code since it is difficult to handle
      the resulting trap gracefully.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d075745d
    • P
      KVM: PPC: Book3S HV: Allow HPT and radix on the same core for POWER9 v2.2 · 00608e1f
      Paul Mackerras 提交于
      POWER9 chip versions starting with "Nimbus" v2.2 can support running
      with some threads of a core in HPT mode and others in radix mode.
      This means that we don't have to prohibit independent-threads mode
      when running a HPT guest on a radix host, and we don't have to do any
      of the synchronization between threads that was introduced in commit
      c0101509 ("KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix
      hosts", 2017-10-19).
      
      Rather than using up another CPU feature bit, we just do an
      explicit test on the PVR (processor version register) at module
      startup time to determine whether we have to take steps to avoid
      having some threads in HPT mode and some in radix mode (so-called
      "mixed mode").  We test for "Nimbus" (indicated by 0 or 1 in the top
      nibble of the lower 16 bits) v2.2 or later, or "Cumulus" (indicated by
      2 or 3 in that nibble) v1.1 or later.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      00608e1f
  14. 17 1月, 2018 3 次提交
    • N
      powerpc/64s: Improve local TLB flush for boot and MCE on POWER9 · d4748276
      Nicholas Piggin 提交于
      There are several cases outside the normal address space management
      where a CPU's entire local TLB is to be flushed:
      
        1. Booting the kernel, in case something has left stale entries in
           the TLB (e.g., kexec).
      
        2. Machine check, to clean corrupted TLB entries.
      
      One other place where the TLB is flushed, is waking from deep idle
      states. The flush is a side-effect of calling ->cpu_restore with the
      intention of re-setting various SPRs. The flush itself is unnecessary
      because in the first case, the TLB should not acquire new corrupted
      TLB entries as part of sleep/wake (though they may be lost).
      
      This type of TLB flush is coded inflexibly, several times for each CPU
      type, and they have a number of problems with ISA v3.0B:
      
      - The current radix mode of the MMU is not taken into account, it is
        always done as a hash flushn For IS=2 (LPID-matching flush from host)
        and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if
        the R field does not match the current radix mode.
      
      - ISA v3.0B hash must flush the partition and process table caches as
        well.
      
      - ISA v3.0B radix must flush partition and process scoped translations,
        partition and process table caches, and also the page walk cache.
      
      So consolidate the flushing code and implement it in C and inline asm
      under the mm/ directory with the rest of the flush code. Add ISA v3.0B
      cases for radix and hash, and use the radix flush in radix environment.
      
      Provide a way for IS=2 (LPID flush) to specify the radix mode of the
      partition. Have KVM pass in the radix mode of the guest.
      
      Take out the flushes from early cputable/dt_cpu_ftrs detection hooks,
      and move it later in the boot process after, the MMU registers are set
      up and before relocation is first turned on.
      
      The TLB flush is no longer called when restoring from deep idle states.
      This was not be done as a separate step because booting secondaries
      uses the same cpu_restore as idle restore, which needs the TLB flush.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d4748276
    • P
      KVM: PPC: Book3S HV: Do SLB load/unload with guest LPCR value loaded · 6964e6a4
      Paul Mackerras 提交于
      This moves the code that loads and unloads the guest SLB values so that
      it is done while the guest LPCR value is loaded in the LPCR register.
      The reason for doing this is that on POWER9, the behaviour of the
      slbmte instruction depends on the LPCR[UPRT] bit.  If UPRT is 1, as
      it is for a radix host (or guest), the SLB index is truncated to
      2 bits.  This means that for a HPT guest on a radix host, the SLB
      was not being loaded correctly, causing the guest to crash.
      
      The SLB is now loaded much later in the guest entry path, after the
      LPCR is loaded, which for a secondary thread is after it sees that
      the primary thread has switched the MMU to the guest.  The loop that
      waits for the primary thread has a branch out to the exit code that
      is taken if it sees that other threads have commenced exiting the
      guest.  Since we have now not loaded the SLB at this point, we make
      this path branch to a new label 'guest_bypass' and we move the SLB
      unload code to before this label.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      6964e6a4
    • P
      KVM: PPC: Book3S HV: Make sure we don't re-enter guest without XIVE loaded · 43ff3f65
      Paul Mackerras 提交于
      This fixes a bug where it is possible to enter a guest on a POWER9
      system without having the XIVE (interrupt controller) context loaded.
      This can happen because we unload the XIVE context from the CPU
      before doing the real-mode handling for machine checks.  After the
      real-mode handler runs, it is possible that we re-enter the guest
      via a fast path which does not load the XIVE context.
      
      To fix this, we move the unloading of the XIVE context to come after
      the real-mode machine check handler is called.
      
      Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.11+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      43ff3f65
  15. 16 1月, 2018 1 次提交
    • P
      KVM: PPC: Book3S HV: Enable migration of decrementer register · 5855564c
      Paul Mackerras 提交于
      This adds a register identifier for use with the one_reg interface
      to allow the decrementer expiry time to be read and written by
      userspace.  The decrementer expiry time is in guest timebase units
      and is equal to the sum of the decrementer and the guest timebase.
      (The expiry time is used rather than the decrementer value itself
      because the expiry time is not constantly changing, though the
      decrementer value is, while the guest vcpu is not running.)
      
      Without this, a guest vcpu migrated to a new host will see its
      decrementer set to some random value.  On POWER8 and earlier, the
      decrementer is 32 bits wide and counts down at 512MHz, so the
      guest vcpu will potentially see no decrementer interrupts for up
      to about 4 seconds, which will lead to a stall.  With POWER9, the
      decrementer is now 56 bits side, so the stall can be much longer
      (up to 2.23 years) and more noticeable.
      
      To help work around the problem in cases where userspace has not been
      updated to migrate the decrementer expiry time, we now set the
      default decrementer expiry at vcpu creation time to the current time
      rather than the maximum possible value.  This should mean an
      immediate decrementer interrupt when a migrated vcpu starts
      running.  In cases where the decrementer is 32 bits wide and more
      than 4 seconds elapse between the creation of the vcpu and when it
      first runs, the decrementer would have wrapped around to positive
      values and there may still be a stall - but this is no worse than
      the current situation.  In the large-decrementer case, we are sure
      to get an immediate decrementer interrupt (assuming the time from
      vcpu creation to first run is less than 2.23 years) and we thus
      avoid a very long stall.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      5855564c
  16. 11 1月, 2018 2 次提交
  17. 10 1月, 2018 1 次提交
    • D
      KVM: PPC: Book3S HV: Always flush TLB in kvmppc_alloc_reset_hpt() · ecba8297
      David Gibson 提交于
      The KVM_PPC_ALLOCATE_HTAB ioctl(), implemented by kvmppc_alloc_reset_hpt()
      is supposed to completely clear and reset a guest's Hashed Page Table (HPT)
      allocating or re-allocating it if necessary.
      
      In the case where an HPT of the right size already exists and it just
      zeroes it, it forces a TLB flush on all guest CPUs, to remove any stale TLB
      entries loaded from the old HPT.
      
      However, that situation can arise when the HPT is resizing as well - or
      even when switching from an RPT to HPT - so those cases need a TLB flush as
      well.
      
      So, move the TLB flush to trigger in all cases except for errors.
      
      Cc: stable@vger.kernel.org # v4.10+
      Fixes: f98a8bf9 ("KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size")
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ecba8297